[Pw_forum] mpi error using pw.x

Rolly Ng rollyng at gmail.com
Mon May 16 15:43:09 CEST 2016


Hi Chong Wang,

Perhaps it would be better to run ./configure with,
./configure CC=icc CXX=icpc F90=ifort F77=ifort MPIF90=mpiifort 
--with-scalapack=intel

so that QE knows which compiler to use, verified with QE v5.3.0.

Rolly

On 05/16/2016 05:52 PM, Paolo Giannozzi wrote:
> On Mon, May 16, 2016 at 4:11 AM, Chong Wang <ch-wang at outlook.com 
> <mailto:ch-wang at outlook.com>> wrote:
>
>     I have checked my mpif90 calls gfortran so there's no mix up.
>
>
> I am not sure it is possible to use gfortran together with intel mpi. 
> If you have intel mpi and mkl, presumably you have the intel compiler 
> as well.
>
>     Can you kindly share with me your make.sys?
>
>
> it doesn't make sense to share a make.sys file unless the software 
> configuration is the same.
>
> Paolo
>
>
>     Thanks in advance!
>
>
>     Best!
>
>
>     Chong Wang
>
>     ------------------------------------------------------------------------
>     *From:* pw_forum-bounces at pwscf.org
>     <mailto:pw_forum-bounces at pwscf.org> <pw_forum-bounces at pwscf.org
>     <mailto:pw_forum-bounces at pwscf.org>> on behalf of Paolo Giannozzi
>     <p.giannozzi at gmail.com <mailto:p.giannozzi at gmail.com>>
>     *Sent:* Monday, May 16, 2016 3:10 AM
>     *To:* PWSCF Forum
>     *Subject:* Re: [Pw_forum] mpi error using pw.x
>     Your make.sys shows clear signs of mixup between ifort and
>     gfortran. Please verify that mpif90 calls ifort and not gfortran
>     (or vice versa). Configure issues a warning if this happens.
>
>     I have successfully run your test on a machine with some recent
>     intel compiler and intel mpi. The second output (run as mpirun -np
>     18 pw.x -nk 18....) is an example of what I mean by "type of
>     parallelization": there are many different parallelization levels
>     in QE. This is on k-points (and runs faster in this case on less
>     processors than parallelization on plane waves).
>
>     Paolo
>
>     On Sun, May 15, 2016 at 6:01 PM, Chong Wang <ch-wang at outlook.com
>     <mailto:ch-wang at outlook.com>> wrote:
>
>         Hi,
>
>
>         I have done more test:
>
>         1. intel mpi 2015 yields segment fault
>
>         2. intel mpi 2013 yields the same error here
>
>         Did I do something wrong with compiling? Here's my make.sys:
>
>
>         # make.sys.  Generated from make.sys.in <http://make.sys.in>
>         by configure.
>
>
>         # compilation rules
>
>
>         .SUFFIXES :
>
>         .SUFFIXES : .o .c .f .f90
>
>
>         # most fortran compilers can directly preprocess c-like
>         directives: use
>
>         # $(MPIF90) $(F90FLAGS) -c $<
>
>         # if explicit preprocessing by the C preprocessor is needed, use:
>
>         # $(CPP) $(CPPFLAGS) $< -o $*.F90
>
>         #$(MPIF90) $(F90FLAGS) -c $*.F90 -o $*.o
>
>         # remember the tabulator in the first column !!!
>
>
>         .f90.o:
>
>         $(MPIF90) $(F90FLAGS) -c $<
>
>
>         # .f.o and .c.o: do not modify
>
>
>         .f.o:
>
>         $(F77) $(FFLAGS) -c $<
>
>
>         .c.o:
>
>         $(CC) $(CFLAGS) -c $<
>
>
>
>
>         # Top QE directory, not used in QE but useful for linking QE
>         libs with plugins
>
>         # The following syntax should always point to TOPDIR:
>
>         #   $(dir $(abspath $(filter %make.sys,$(MAKEFILE_LIST))))
>
>
>         TOPDIR = /home/wangc/temp/espresso-5.4.0
>
>
>         # DFLAGS  = precompilation options (possible arguments to -D
>         and -U)
>
>         #           used by the C compiler and preprocessor
>
>         # FDFLAGS = as DFLAGS, for the f90 compiler
>
>         # See include/defs.h.README for a list of options and their
>         meaning
>
>         # With the exception of IBM xlf, FDFLAGS = $(DFLAGS)
>
>         # For IBM xlf, FDFLAGS is the same as DFLAGS with separating
>         commas
>
>
>         # MANUAL_DFLAGS  = additional precompilation option(s), if desired
>
>         #                  BEWARE: it does not work for IBM xlf!
>         Manually edit FDFLAGS
>
>         MANUAL_DFLAGS  =
>
>         DFLAGS         =  -D__GFORTRAN -D__STD_F95 -D__DFTI -D__MPI
>         -D__PARA -D__SCALAPACK
>
>         FDFLAGS        = $(DFLAGS) $(MANUAL_DFLAGS)
>
>
>         # IFLAGS = how to locate directories with *.h or *.f90 file to
>         be included
>
>         #          typically -I../include -I/some/other/directory/
>
>         #          the latter contains .e.g. files needed by FFT libraries
>
>
>         IFLAGS         = -I../include
>         -I/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include
>
>
>         # MOD_FLAGS = flag used by f90 compiler to locate modules
>
>         # Each Makefile defines the list of needed modules in MODFLAGS
>
>
>         MOD_FLAG      = -I
>
>
>         # Compilers: fortran-90, fortran-77, C
>
>         # If a parallel compilation is desired, MPIF90 should be a
>         fortran-90
>
>         # compiler that produces executables for parallel execution
>         using MPI
>
>         # (such as for instance mpif90, mpf90, mpxlf90,...);
>
>         # otherwise, an ordinary fortran-90 compiler (f90, g95, xlf90,
>         ifort,...)
>
>         # If you have a parallel machine but no suitable candidate for
>         MPIF90,
>
>         # try to specify the directory containing "mpif.h" in IFLAGS
>
>         # and to specify the location of MPI libraries in MPI_LIBS
>
>
>         MPIF90         = mpif90
>
>         #F90           = gfortran
>
>         CC             = cc
>
>         F77            = gfortran
>
>
>         # C preprocessor and preprocessing flags - for explicit
>         preprocessing,
>
>         # if needed (see the compilation rules above)
>
>         # preprocessing flags must include DFLAGS and IFLAGS
>
>
>         CPP            = cpp
>
>         CPPFLAGS       = -P -C -traditional $(DFLAGS) $(IFLAGS)
>
>
>         # compiler flags: C, F90, F77
>
>         # C flags must include DFLAGS and IFLAGS
>
>         # F90 flags must include MODFLAGS, IFLAGS, and FDFLAGS with
>         appropriate syntax
>
>
>         CFLAGS         = -O3 $(DFLAGS) $(IFLAGS)
>
>         F90FLAGS       = $(FFLAGS) -x f95-cpp-input $(FDFLAGS)
>         $(IFLAGS) $(MODFLAGS)
>
>         FFLAGS         = -O3 -g
>
>
>         # compiler flags without optimization for fortran-77
>
>         # the latter is NEEDED to properly compile dlamch.f, used by
>         lapack
>
>
>         FFLAGS_NOOPT   = -O0 -g
>
>
>         # compiler flag needed by some compilers when the main program
>         is not fortran
>
>         # Currently used for Yambo
>
>
>         FFLAGS_NOMAIN   =
>
>
>         # Linker, linker-specific flags (if any)
>
>         # Typically LD coincides with F90 or MPIF90, LD_LIBS is empty
>
>
>         LD             = mpif90
>
>         LDFLAGS        =  -g -pthread
>
>         LD_LIBS        =
>
>
>         # External Libraries (if any) : blas, lapack, fft, MPI
>
>
>         # If you have nothing better, use the local copy :
>
>         # BLAS_LIBS = /your/path/to/espresso/BLAS/blas.a
>
>         # BLAS_LIBS_SWITCH = internal
>
>
>         BLAS_LIBS      = -lmkl_gf_lp64  -lmkl_sequential -lmkl_core
>
>         BLAS_LIBS_SWITCH = external
>
>
>         # If you have nothing better, use the local copy :
>
>         # LAPACK_LIBS = /your/path/to/espresso/lapack-3.2/lapack.a
>
>         # LAPACK_LIBS_SWITCH = internal
>
>         # For IBM machines with essl (-D__ESSL): load essl BEFORE lapack !
>
>         # remember that LAPACK_LIBS precedes BLAS_LIBS in loading order
>
>
>         LAPACK_LIBS    =
>
>         LAPACK_LIBS_SWITCH = external
>
>
>         ELPA_LIBS_SWITCH = disabled
>
>         SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
>
>
>         # nothing needed here if the the internal copy of FFTW is compiled
>
>         # (needs -D__FFTW in DFLAGS)
>
>
>         FFT_LIBS       =
>
>
>         # For parallel execution, the correct path to MPI libraries must
>
>         # be specified in MPI_LIBS (except for IBM if you use mpxlf)
>
>
>         MPI_LIBS       =
>
>
>         # IBM-specific: MASS libraries, if available and if -D__MASS
>         is defined in FDFLAGS
>
>
>         MASS_LIBS      =
>
>
>         # ar command and flags - for most architectures: AR = ar,
>         ARFLAGS = ruv
>
>
>         AR             = ar
>
>         ARFLAGS        = ruv
>
>
>         # ranlib command. If ranlib is not needed (it isn't in most
>         cases) use
>
>         # RANLIB = echo
>
>
>         RANLIB         = ranlib
>
>
>         # all internal and external libraries - do not modify
>
>
>         FLIB_TARGETS   = all
>
>
>         LIBOBJS        = ../clib/clib.a ../iotk/src/libiotk.a
>
>         LIBS           = $(SCALAPACK_LIBS) $(LAPACK_LIBS) $(FFT_LIBS)
>         $(BLAS_LIBS) $(MPI_LIBS) $(MASS_LIBS) $(LD_LIBS)
>
>
>         # wget or curl - useful to download from network
>
>         WGET = wget -O
>
>
>         # Install directory - not currently used
>
>         PREFIX = /usr/local
>
>
>         Cheers!
>
>
>         Chong Wang
>
>         ------------------------------------------------------------------------
>         *From:* pw_forum-bounces at pwscf.org
>         <mailto:pw_forum-bounces at pwscf.org>
>         <pw_forum-bounces at pwscf.org
>         <mailto:pw_forum-bounces at pwscf.org>> on behalf of Paolo
>         Giannozzi <p.giannozzi at gmail.com <mailto:p.giannozzi at gmail.com>>
>         *Sent:* Sunday, May 15, 2016 8:28:26 PM
>
>         *To:* PWSCF Forum
>         *Subject:* Re: [Pw_forum] mpi error using pw.x
>         It looks like a compiler/mpi bug, since there is nothing
>         special in your input and in your execution, unless you find
>         evidence that the problem is reproducible on other
>         compiler/mpi versions.
>
>         Paolo
>
>         On Sun, May 15, 2016 at 10:11 AM, Chong Wang
>         <ch-wang at outlook.com <mailto:ch-wang at outlook.com>> wrote:
>
>             Hi,
>
>
>             Thank you for replying.
>
>
>             More details:
>
>
>             1. input data:
>
>             &control
>             calculation='scf'
>             restart_mode='from_scratch',
>                 pseudo_dir = '../pot/',
>                 outdir='./out/'
>                 prefix='BaTiO3'
>             /
>             &system
>                 nbnd = 48
>                 ibrav = 0, nat = 5, ntyp = 3
>                 ecutwfc = 50
>             occupations='smearing', smearing='gaussian', degauss=0.02
>             /
>             &electrons
>                 conv_thr = 1.0e-8
>             /
>             ATOMIC_SPECIES
>              Ba 137.327 Ba.pbe-mt_fhi.UPF
>              Ti 204.380 Ti.pbe-mt_fhi.UPF
>              O  15.999  O.pbe-mt_fhi.UPF
>             ATOMIC_POSITIONS
>              Ba 0.0000000000000000 0.0000000000000000 0.0000000000000000
>              Ti 0.5000000000000000 0.5000000000000000 0.4819999933242795
>              O  0.5000000000000000 0.5000000000000000 0.0160000007599592
>              O  0.5000000000000000  -0.0000000000000000 0.5149999856948849
>              O  0.0000000000000000 0.5000000000000000 0.5149999856948849
>             K_POINTS (automatic)
>             11 11 11 0 0 0
>             CELL_PARAMETERS {angstrom}
>             3.999800000000001     0.000000000000000 0.000000000000000
>             0.000000000000000     3.999800000000001 0.000000000000000
>             0.000000000000000     0.000000000000000 4.018000000000000
>
>             2. number of processors:
>             I tested 24 cores and 8 cores, and both yield the same result.
>
>             3. type of parallelization:
>             I don't know your meaning. I execute pw.x by:
>             mpirun  -np 24 pw.x < BTO.scf.in <http://BTO.scf.in> >> output
>
>             'which mpirun' output:
>             /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun
>
>             4. when the error occurs:
>             in the middle of the run. The last a few lines of the
>             output is
>                  total cpu time spent up to now is   32.9 secs
>
>                  total energy            =  -105.97885119 Ry
>                  Harris-Foulkes estimate   =  -105.99394457 Ry
>                  estimated scf accuracy    < 0.03479229 Ry
>
>                  iteration #  7     ecut=    50.00 Ry     beta=0.70
>                  Davidson diagonalization with overlap
>                  ethr =  1.45E-04,  avg # of iterations =  2.7
>
>                  total cpu time spent up to now is   37.3 secs
>
>                  total energy            =  -105.99039982 Ry
>                  Harris-Foulkes estimate   =  -105.99025175 Ry
>                  estimated scf accuracy    < 0.00927902 Ry
>
>                  iteration #  8     ecut=    50.00 Ry     beta=0.70
>                  Davidson diagonalization with overlap
>
>             5. Error message:
>             Something like:
>             Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>             PMPI_Cart_sub(242)...................:
>             MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc03ae5f38,
>             comm_new=0x7ffc03ae5e90) failed
>             PMPI_Cart_sub(178)...................:
>             MPIR_Comm_split_impl(270)............:
>             MPIR_Get_contextid_sparse_group(1330): Too many
>             communicators (0/16384 free on this process; ignore_id=0)
>             Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>             PMPI_Cart_sub(242)...................:
>             MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffd10080408,
>             comm_new=0x7ffd10080360) failed
>             PMPI_Cart_sub(178)...................:
>
>             Cheers!
>
>             Chong
>             ------------------------------------------------------------------------
>             *From:* pw_forum-bounces at pwscf.org
>             <mailto:pw_forum-bounces at pwscf.org>
>             <pw_forum-bounces at pwscf.org
>             <mailto:pw_forum-bounces at pwscf.org>> on behalf of Paolo
>             Giannozzi <p.giannozzi at gmail.com
>             <mailto:p.giannozzi at gmail.com>>
>             *Sent:* Sunday, May 15, 2016 3:43 PM
>             *To:* PWSCF Forum
>             *Subject:* Re: [Pw_forum] mpi error using pw.x
>             Please tell us what is wrong and we will fix it.
>
>             Seriously: nobody can answer your question unless you
>             specify, as a strict minimum, input data, number of
>             processors and type of parallelization that trigger the
>             error, and where the error occurs (at startup, later, in
>             the middle of the run, ...).
>
>             Paolo
>
>             On Sun, May 15, 2016 at 7:50 AM, Chong Wang
>             <ch-wang at outlook.com <mailto:ch-wang at outlook.com>> wrote:
>
>                 I compiled quantum espresso 5.4 with intel mpi and mkl
>                 2016 update 3.
>
>                 However, when I ran pw.x the following errors were
>                 reported:
>
>                 ...
>                 MPIR_Get_contextid_sparse_group(1330): Too many
>                 communicators (0/16384 free on this process; ignore_id=0)
>                 Fatal error in PMPI_Cart_sub: Other MPI error, error
>                 stack:
>                 PMPI_Cart_sub(242)...................:
>                 MPI_Cart_sub(comm=0xc400fcf3,
>                 remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30)
>                 failed
>                 PMPI_Cart_sub(178)...................:
>                 MPIR_Comm_split_impl(270)............:
>                 MPIR_Get_contextid_sparse_group(1330): Too many
>                 communicators (0/16384 free on this process; ignore_id=0)
>                 Fatal error in PMPI_Cart_sub: Other MPI error, error
>                 stack:
>                 PMPI_Cart_sub(242)...................:
>                 MPI_Cart_sub(comm=0xc400fcf3,
>                 remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10)
>                 failed
>                 PMPI_Cart_sub(178)...................:
>                 MPIR_Comm_split_impl(270)............:
>                 MPIR_Get_contextid_sparse_group(1330): Too many
>                 communicators (0/16384 free on this process; ignore_id=0)
>                 Fatal error in PMPI_Cart_sub: Other MPI error, error
>                 stack:
>                 PMPI_Cart_sub(242)...................:
>                 MPI_Cart_sub(comm=0xc400fcf3,
>                 remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050)
>                 failed
>                 PMPI_Cart_sub(178)...................:
>                 MPIR_Comm_split_impl(270)............:
>                 MPIR_Get_contextid_sparse_group(1330): Too many
>                 communicators (0/16384 free on this process; ignore_id=0)
>
>                 I googled and found out this might be caused by
>                 hitting os limits of number of opened files. However,
>                 After I increased number of opened files per process
>                 from 1024 to 40960, the error persists.
>
>
>                 What's wrong here?
>
>
>                 Chong Wang
>
>                 Ph. D. candidate
>
>                 Institute for Advanced Study, Tsinghua University,
>                 Beijing, 100084
>
>
>                 _______________________________________________
>                 Pw_forum mailing list
>                 Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
>                 http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
>             -- 
>             Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e
>             Fisiche,
>             Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>             Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax
>             +39-0432-558222 <tel:%2B39-0432-558222>
>
>
>             _______________________________________________
>             Pw_forum mailing list
>             Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
>             http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
>         -- 
>         Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>         Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>         Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax
>         +39-0432-558222 <tel:%2B39-0432-558222>
>
>
>         _______________________________________________
>         Pw_forum mailing list
>         Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
>         http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
>     -- 
>     Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>     Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>     Phone +39-0432-558216 <tel:%2B39-0432-558216>, fax +39-0432-558222
>     <tel:%2B39-0432-558222>
>
>
>     _______________________________________________
>     Pw_forum mailing list
>     Pw_forum at pwscf.org <mailto:Pw_forum at pwscf.org>
>     http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
> -- 
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum

-- 
PhD. Research Fellow,
Dept. of Physics & Materials Science,
City University of Hong Kong
Tel: +852 3442 4000
Fax: +852 3442 0538

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160516/e1b6f3f0/attachment.html>


More information about the users mailing list