[Pw_forum] mpi error using pw.x
Rolly Ng
rollyng at gmail.com
Mon May 16 15:43:09 CEST 2016
Hi Chong Wang,
Perhaps it would be better to run ./configure with,
./configure CC=icc CXX=icpc F90=ifort F77=ifort MPIF90=mpiifort
--with-scalapack=intel
so that QE knows which compiler to use, verified with QE v5.3.0.
Rolly
On 05/16/2016 05:52 PM, Paolo Giannozzi wrote:
> On Mon, May 16, 2016 at 4:11 AM, Chong Wang <ch-wang at outlook.com
> <mailto:ch-wang at outlook.com>> wrote:
>
> I have checked my mpif90 calls gfortran so there's no mix up.
>
>
> I am not sure it is possible to use gfortran together with intel mpi.
> If you have intel mpi and mkl, presumably you have the intel compiler
> as well.
>
> Can you kindly share with me your make.sys?
>
>
> it doesn't make sense to share a make.sys file unless the software
> configuration is the same.
>
> Paolo
>
>
> Thanks in advance!
>
>
> Best!
>
>
> Chong Wang
>
> ------------------------------------------------------------------------
> *From:* pw_forum-bounces at pwscf.org
> <mailto:pw_forum-bounces at pwscf.org> <pw_forum-bounces at pwscf.org
> <mailto:pw_forum-bounces at pwscf.org>> on behalf of Paolo Giannozzi
> <p.giannozzi at gmail.com <mailto:p.giannozzi at gmail.com>>
> *Sent:* Monday, May 16, 2016 3:10 AM
> *To:* PWSCF Forum
> *Subject:* Re: [Pw_forum] mpi error using pw.x
> Your make.sys shows clear signs of mixup between ifort and
> gfortran. Please verify that mpif90 calls ifort and not gfortran
> (or vice versa). Configure issues a warning if this happens.
>
> I have successfully run your test on a machine with some recent
> intel compiler and intel mpi. The second output (run as mpirun -np
> 18 pw.x -nk 18....) is an example of what I mean by "type of
> parallelization": there are many different parallelization levels
> in QE. This is on k-points (and runs faster in this case on less
> processors than parallelization on plane waves).
>
> Paolo
>
> On Sun, May 15, 2016 at 6:01 PM, Chong Wang <ch-wang at outlook.com
> <mailto:ch-wang at outlook.com>> wrote:
>
> Hi,
>
>
> I have done more test:
>
> 1. intel mpi 2015 yields segment fault
>
> 2. intel mpi 2013 yields the same error here
>
> Did I do something wrong with compiling? Here's my make.sys:
>
>
> # make.sys. Generated from make.sys.in <http://make.sys.in>
> by configure.
>
>
> # compilation rules
>
>
> .SUFFIXES :
>
> .SUFFIXES : .o .c .f .f90
>
>
> # most fortran compilers can directly preprocess c-like
> directives: use
>
> # $(MPIF90) $(F90FLAGS) -c $<
>
> # if explicit preprocessing by the C preprocessor is needed, use:
>
> # $(CPP) $(CPPFLAGS) $< -o $*.F90
>
> #$(MPIF90) $(F90FLAGS) -c $*.F90 -o $*.o
>
> # remember the tabulator in the first column !!!
>
>
> .f90.o:
>
> $(MPIF90) $(F90FLAGS) -c $<
>
>
> # .f.o and .c.o: do not modify
>
>
> .f.o:
>
> $(F77) $(FFLAGS) -c $<
>
>
> .c.o:
>
> $(CC) $(CFLAGS) -c $<
>
>
>
>
> # Top QE directory, not used in QE but useful for linking QE
> libs with plugins
>
> # The following syntax should always point to TOPDIR:
>
> # $(dir $(abspath $(filter %make.sys,$(MAKEFILE_LIST))))
>
>
> TOPDIR = /home/wangc/temp/espresso-5.4.0
>
>
> # DFLAGS = precompilation options (possible arguments to -D
> and -U)
>
> # used by the C compiler and preprocessor
>
> # FDFLAGS = as DFLAGS, for the f90 compiler
>
> # See include/defs.h.README for a list of options and their
> meaning
>
> # With the exception of IBM xlf, FDFLAGS = $(DFLAGS)
>
> # For IBM xlf, FDFLAGS is the same as DFLAGS with separating
> commas
>
>
> # MANUAL_DFLAGS = additional precompilation option(s), if desired
>
> # BEWARE: it does not work for IBM xlf!
> Manually edit FDFLAGS
>
> MANUAL_DFLAGS =
>
> DFLAGS = -D__GFORTRAN -D__STD_F95 -D__DFTI -D__MPI
> -D__PARA -D__SCALAPACK
>
> FDFLAGS = $(DFLAGS) $(MANUAL_DFLAGS)
>
>
> # IFLAGS = how to locate directories with *.h or *.f90 file to
> be included
>
> # typically -I../include -I/some/other/directory/
>
> # the latter contains .e.g. files needed by FFT libraries
>
>
> IFLAGS = -I../include
> -I/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include
>
>
> # MOD_FLAGS = flag used by f90 compiler to locate modules
>
> # Each Makefile defines the list of needed modules in MODFLAGS
>
>
> MOD_FLAG = -I
>
>
> # Compilers: fortran-90, fortran-77, C
>
> # If a parallel compilation is desired, MPIF90 should be a
> fortran-90
>
> # compiler that produces executables for parallel execution
> using MPI
>
> # (such as for instance mpif90, mpf90, mpxlf90,...);
>
> # otherwise, an ordinary fortran-90 compiler (f90, g95, xlf90,
> ifort,...)
>
> # If you have a parallel machine but no suitable candidate for
> MPIF90,
>
> # try to specify the directory containing "mpif.h" in IFLAGS
>
> # and to specify the location of MPI libraries in MPI_LIBS
>
>
> MPIF90 = mpif90
>
> #F90 = gfortran
>
> CC = cc
>
> F77 = gfortran
>
>
> # C preprocessor and preprocessing flags - for explicit
> preprocessing,
>
> # if needed (see the compilation rules above)
>
> # preprocessing flags must include DFLAGS and IFLAGS
>
>
> CPP = cpp
>
> CPPFLAGS = -P -C -traditional $(DFLAGS) $(IFLAGS)
>
>
> # compiler flags: C, F90, F77
>
> # C flags must include DFLAGS and IFLAGS
>
> # F90 flags must include MODFLAGS, IFLAGS, and FDFLAGS with
> appropriate syntax
>
>
> CFLAGS = -O3 $(DFLAGS) $(IFLAGS)
>
> F90FLAGS = $(FFLAGS) -x f95-cpp-input $(FDFLAGS)
> $(IFLAGS) $(MODFLAGS)
>
> FFLAGS = -O3 -g
>
>
> # compiler flags without optimization for fortran-77
>
> # the latter is NEEDED to properly compile dlamch.f, used by
> lapack
>
>
> FFLAGS_NOOPT = -O0 -g
>
>
> # compiler flag needed by some compilers when the main program
> is not fortran
>
> # Currently used for Yambo
>
>
> FFLAGS_NOMAIN =
>
>
> # Linker, linker-specific flags (if any)
>
> # Typically LD coincides with F90 or MPIF90, LD_LIBS is empty
>
>
> LD = mpif90
>
> LDFLAGS = -g -pthread
>
> LD_LIBS =
>
>
> # External Libraries (if any) : blas, lapack, fft, MPI
>
>
> # If you have nothing better, use the local copy :
>
> # BLAS_LIBS = /your/path/to/espresso/BLAS/blas.a
>
> # BLAS_LIBS_SWITCH = internal
>
>
> BLAS_LIBS = -lmkl_gf_lp64 -lmkl_sequential -lmkl_core
>
> BLAS_LIBS_SWITCH = external
>
>
> # If you have nothing better, use the local copy :
>
> # LAPACK_LIBS = /your/path/to/espresso/lapack-3.2/lapack.a
>
> # LAPACK_LIBS_SWITCH = internal
>
> # For IBM machines with essl (-D__ESSL): load essl BEFORE lapack !
>
> # remember that LAPACK_LIBS precedes BLAS_LIBS in loading order
>
>
> LAPACK_LIBS =
>
> LAPACK_LIBS_SWITCH = external
>
>
> ELPA_LIBS_SWITCH = disabled
>
> SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
>
>
> # nothing needed here if the the internal copy of FFTW is compiled
>
> # (needs -D__FFTW in DFLAGS)
>
>
> FFT_LIBS =
>
>
> # For parallel execution, the correct path to MPI libraries must
>
> # be specified in MPI_LIBS (except for IBM if you use mpxlf)
>
>
> MPI_LIBS =
>
>
> # IBM-specific: MASS libraries, if available and if -D__MASS
> is defined in FDFLAGS
>
>
> MASS_LIBS =
>
>
> # ar command and flags - for most architectures: AR = ar,
> ARFLAGS = ruv
>
>
> AR = ar
>
> ARFLAGS = ruv
>
>
> # ranlib command. If ranlib is not needed (it isn't in most
> cases) use
>
> # RANLIB = echo
>
>
> RANLIB = ranlib
>
>
> # all internal and external libraries - do not modify
>
>
> FLIB_TARGETS = all
>
>
> LIBOBJS = ../clib/clib.a ../iotk/src/libiotk.a
>
> LIBS = $(SCALAPACK_LIBS) $(LAPACK_LIBS) $(FFT_LIBS)
> $(BLAS_LIBS) $(MPI_LIBS) $(MASS_LIBS) $(LD_LIBS)
>
>
> # wget or curl - useful to download from network
>
> WGET = wget -O
>
>
> # Install directory - not currently used
>
> PREFIX = /usr/local
>
>
> Cheers!
>
>
> Chong Wang
>
> ------------------------------------------------------------------------
> *From:* pw_forum-bounces at pwscf.org
> <mailto:pw_forum-bounces at pwscf.org>
> <pw_forum-bounces at pwscf.org
> <mailto:pw_forum-bounces at pwscf.org>> on behalf of Paolo
> Giannozzi <p.giannozzi at gmail.com <mailto:p.giannozzi at gmail.com>>
> *Sent:* Sunday, May 15, 2016 8:28:26 PM
>
> *To:* PWSCF Forum
> *Subject:* Re: [Pw_forum] mpi error using pw.x
> It looks like a compiler/mpi bug, since there is nothing
> special in your input and in your execution, unless you find
> evidence that the problem is reproducible on other
> compiler/mpi versions.
>
> Paolo
>
> On Sun, May 15, 2016 at 10:11 AM, Chong Wang
> <ch-wang at outlook.com <mailto:ch-wang at outlook.com>> wrote:
>
> Hi,
>
>
> Thank you for replying.
>
>
> More details:
>
>
> 1. input data:
>
> &control
> calculation='scf'
> restart_mode='from_scratch',
> pseudo_dir = '../pot/',
> outdir='./out/'
> prefix='BaTiO3'
> /
> &system
> nbnd = 48
> ibrav = 0, nat = 5, ntyp = 3
> ecutwfc = 50
> occupations='smearing', smearing='gaussian', degauss=0.02
> /
> &electrons
> conv_thr = 1.0e-8
> /
> ATOMIC_SPECIES
> Ba 137.327 Ba.pbe-mt_fhi.UPF
> Ti 204.380 Ti.pbe-mt_fhi.UPF
> O 15.999 O.pbe-mt_fhi.UPF
> ATOMIC_POSITIONS
> Ba 0.0000000000000000 0.0000000000000000 0.0000000000000000
> Ti 0.5000000000000000 0.5000000000000000 0.4819999933242795
> O 0.5000000000000000 0.5000000000000000 0.0160000007599592
> O 0.5000000000000000 -0.0000000000000000 0.5149999856948849
> O 0.0000000000000000 0.5000000000000000 0.5149999856948849
> K_POINTS (automatic)
> 11 11 11 0 0 0
> CELL_PARAMETERS {angstrom}
> 3.999800000000001 0.000000000000000 0.000000000000000
> 0.000000000000000 3.999800000000001 0.000000000000000
> 0.000000000000000 0.000000000000000 4.018000000000000
>
> 2. number of processors:
> I tested 24 cores and 8 cores, and both yield the same result.
>
> 3. type of parallelization:
> I don't know your meaning. I execute pw.x by:
> mpirun -np 24 pw.x < BTO.scf.in <http://BTO.scf.in> >> output
>
> 'which mpirun' output:
> /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun
>
> 4. when the error occurs:
> in the middle of the run. The last a few lines of the
> output is
> total cpu time spent up to now is 32.9 secs
>
> total energy = -105.97885119 Ry
> Harris-Foulkes estimate = -105.99394457 Ry
> estimated scf accuracy < 0.03479229 Ry
>
> iteration # 7 ecut= 50.00 Ry beta=0.70
> Davidson diagonalization with overlap
> ethr = 1.45E-04, avg # of iterations = 2.7
>
> total cpu time spent up to now is 37.3 secs
>
> total energy = -105.99039982 Ry
> Harris-Foulkes estimate = -105.99025175 Ry
> estimated scf accuracy < 0.00927902 Ry
>
> iteration # 8 ecut= 50.00 Ry beta=0.70
> Davidson diagonalization with overlap
>
> 5. Error message:
> Something like:
> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
> PMPI_Cart_sub(242)...................:
> MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc03ae5f38,
> comm_new=0x7ffc03ae5e90) failed
> PMPI_Cart_sub(178)...................:
> MPIR_Comm_split_impl(270)............:
> MPIR_Get_contextid_sparse_group(1330): Too many
> communicators (0/16384 free on this process; ignore_id=0)
> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
> PMPI_Cart_sub(242)...................:
> MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffd10080408,
> comm_new=0x7ffd10080360) failed
> PMPI_Cart_sub(178)...................:
>
> Cheers!
>
> Chong
> ------------------------------------------------------------------------
> *From:* pw_forum-bounces at pwscf.org
> <mailto:pw_forum-bounces at pwscf.org>
> <pw_forum-bounces at pwscf.org
> <mailto:pw_forum-bounces at pwscf.org>> on behalf of Paolo
> Giannozzi <p.giannozzi at gmail.com
> <mailto:p.giannozzi at gmail.com>>
> *Sent:* Sunday, May 15, 2016 3:43 PM
> *To:* PWSCF Forum
> *Subject:* Re: [Pw_forum] mpi error using pw.x
> Please tell us what is wrong and we will fix it.
>
> Seriously: nobody can answer your question unless you
> specify, as a strict minimum, input data, number of
> processors and type of parallelization that trigger the
> error, and where the error occurs (at startup, later, in
> the middle of the run, ...).
>
> Paolo
>
> On Sun, May 15, 2016 at 7:50 AM, Chong Wang
> <ch-wang at outlook.com <mailto:ch-wang at outlook.com>> wrote:
>
> I compiled quantum espresso 5.4 with intel mpi and mkl
> 2016 update 3.
>
> However, when I ran pw.x the following errors were
> reported:
>
> ...
> MPIR_Get_contextid_sparse_group(1330): Too many
> communicators (0/16384 free on this process; ignore_id=0)
> Fatal error in PMPI_Cart_sub: Other MPI error, error
> stack:
> PMPI_Cart_sub(242)...................:
> MPI_Cart_sub(comm=0xc400fcf3,
> remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30)
> failed
> PMPI_Cart_sub(178)...................:
> MPIR_Comm_split_impl(270)............:
> MPIR_Get_contextid_sparse_group(1330): Too many
> communicators (0/16384 free on this process; ignore_id=0)
> Fatal error in PMPI_Cart_sub: Other MPI error, error
> stack:
> PMPI_Cart_sub(242)...................:
> MPI_Cart_sub(comm=0xc400fcf3,
> remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10)
> failed
> PMPI_Cart_sub(178)...................:
> MPIR_Comm_split_impl(270)............:
> MPIR_Get_contextid_sparse_group(1330): Too many
> communicators (0/16384 free on this process; ignore_id=0)
> Fatal error in PMPI_Cart_sub: Other MPI error, error
> stack:
> PMPI_Cart_sub(242)...................:
> MPI_Cart_sub(comm=0xc400fcf3,
> remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050)
> failed
> PMPI_Cart_sub(178)...................:
> MPIR_Comm_split_impl(270)............:
> MPIR_Get_contextid_sparse_group(1330): Too many
> communicators (0/16384 free on this process; ignore_id=0)
>
> I googled and found out this might be caused by
> hitting os limits of number of opened files. However,
> After I increased number of opened files per process
> from 1024 to 40960, the error persists.
>
>
> What's wrong here?
>
>
> Chong Wang
>
> Ph. D. candidate
>
> Institute for Advanced Study, Tsinghua University,
> Beijing, 100084
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e
> Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax
> +39-0432-558222 <tel:%2B39-0432-558222>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax
> +39-0432-558222 <tel:%2B39-0432-558222>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax +39-0432-558222
> <tel:%2B39-0432-558222>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
--
PhD. Research Fellow,
Dept. of Physics & Materials Science,
City University of Hong Kong
Tel: +852 3442 4000
Fax: +852 3442 0538
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160516/e1b6f3f0/attachment.html>
More information about the users
mailing list