[QE-users] MPI error in pw.x

Alex.Durie alex.durie at open.ac.uk
Sun Jan 13 13:37:15 CET 2019


Dear all,

Sorry to dredge up an old(ish) problem, but under this new information I was wondering if anyone could help.

As previously reported I am using QE v6.3 pw.x with the Intel compiler suite 16.3 on a linux cluster with a bank of Intel Xeon processors, and am getting strange seemingly unreproducible errors.

If I run the code outside of the bsub batch queue system (I assume this is equivalent to keeping the running blade fixed) with 7 or less processors, I can avoid some of the crashes, to the point that I can sometimes do nscf or bands calculations with pw.x successfully.

What I have been unable to resolve, is the following crash which occurs with post-processing tools such as bands.x or pw2wannier90.x

    forrtl: severe (24): end-of-file during read, unit 99, file STEM/scratch.san/ad5955/Co/./co.save/wfcup1.dat

After a bit of searching, I found the following post https://www.mail-archive.com/users@lists.quantum-espresso.org/msg29686.html

which seems to indicate there is a problem with the same version of the compiler suite that I am using. I was wondering if the errors I am getting are consistent with this bug, or whether I have multiple problems at hand?

As further information, I have just recompiled QE with the flag --disable-parallel and attempted to run example08 within wannier90. Even in serial, I get the following error when attempting the nscf calculation

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine cdiaghg (13):
     eigenvectors failed to converge
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

so would one expect there to be an error in the mkl libraries as well? (as I said above, 'sometimes' I can run nscf and bands calculations successfully, and I am unsure what the cause is here)

Here is an excerpt from the configure script when parallel was disabled in case this indicates any glaring errors

checking build system type... x86_64-pc-linux-gnu
checking ARCH... x86_64
checking setting AR... ... ar
checking setting ARFLAGS... ... ruv
checking whether the Fortran compiler works... yes
checking for Fortran compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU Fortran compiler... no
checking whether ifort accepts -g... yes
checking for gcc... icc
checking whether we are using the GNU C compiler... yes
checking whether icc accepts -g... yes
checking for icc option to accept ISO C89... none needed
checking whether we are using the GNU Fortran compiler... (cached) no
checking whether ifort accepts -g... (cached) yes
checking for Fortran flag to compile .f90 files... none
checking version of ifort... ifort 16
setting F90... ifort
setting MPIF90... ifort
checking whether we are using the GNU C compiler... (cached) yes
checking whether icc accepts -g... (cached) yes
checking for icc option to accept ISO C89... (cached) none needed
setting CC... icc
setting CFLAGS... -O3
checking whether we are using the GNU Fortran 77 compiler... no
checking whether ifort accepts -g... yes
setting F77... ifort
using F90... ifort
setting FFLAGS... -O2 -assume byterecl -g -traceback
setting F90FLAGS... $(FFLAGS) -nomodule
setting FFLAGS_NOOPT... -O0 -assume byterecl -g -traceback
setting FFLAGS_NOMAIN... -nofor_main
setting CPP... cpp
setting CPPFLAGS... -P -traditional
setting LD... ifort
setting LDFLAGS...
checking whether make sets $(MAKE)... yes
checking whether Fortran files must be preprocessed... no
checking for library containing dgemm... -lmkl_intel_lp64
setting BLAS_LIBS... -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
checking for library containing dspev... none required
checking how to run the C preprocessor... icc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking FFT... checking for DftiComputeForward in -lmkl_intel_lp64... yes
checking for /opt/intel/Compiler/*/*/mkl/include/mkl_dfti.f90... no
checking for /opt/intel/mkl/*/include/mkl_dfti.f90... no
checking for /opt/intel/mkl*/include/mkl_dfti.f90... no
checking for /STEM/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include/mkl_dfti.f90... yes

checking MASS...
checking ELPA...
checking for ranlib... ranlib
checking for wget... wget -O
setting WGET... wget -O
setting DFLAGS... -D__DFTI
setting IFLAGS... -I$(TOPDIR)/include -I$(TOPDIR)/FoX/finclude -I$(TOPDIR)/S3DE/iotk/include/ -I/STEM/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include
configure: creating ./config.status
config.status: creating install/make_lapack.inc
config.status: creating include/configure.h
config.status: creating make.inc
config.status: creating configure.msg
config.status: creating install/make_wannier90.inc
config.status: creating include/c_defs.h
--------------------------------------------------------------------
ESPRESSO can take advantage of several optimized numerical libraries
(essl, fftw, mkl...).  This configure script attempts to find them,
but may fail if they have been installed in non-standard locations.
If a required library is not found, the local copy will be compiled.

The following libraries have been found:
  BLAS_LIBS=  -lmkl_intel_lp64  -lmkl_sequential -lmkl_core
  LAPACK_LIBS=
  FFT_LIBS=

Many thanks in advance,

Alex Durie

PhD student

Open University

United Kingdom

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190113/46cbd067/attachment.html>


More information about the users mailing list