[Pw_forum] mpi error using pw.x

Paolo Giannozzi p.giannozzi at gmail.com
Mon May 16 11:52:33 CEST 2016


On Mon, May 16, 2016 at 4:11 AM, Chong Wang <ch-wang at outlook.com> wrote:

I have checked my mpif90 calls gfortran so there's no mix up.
>

I am not sure it is possible to use gfortran together with intel mpi. If
you have intel mpi and mkl, presumably you have the intel compiler as well.


> Can you kindly share with me your make.sys?
>

it doesn't make sense to share a make.sys file unless the software
configuration is the same.

Paolo


Thanks in advance!
>
>
> Best!
>
>
> Chong Wang
> ------------------------------
> *From:* pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf
> of Paolo Giannozzi <p.giannozzi at gmail.com>
> *Sent:* Monday, May 16, 2016 3:10 AM
> *To:* PWSCF Forum
> *Subject:* Re: [Pw_forum] mpi error using pw.x
>
> Your make.sys shows clear signs of mixup between ifort and gfortran.
> Please verify that mpif90 calls ifort and not gfortran (or vice versa).
> Configure issues a warning if this happens.
>
> I have successfully run your test on a machine with some recent intel
> compiler and intel mpi. The second output (run as mpirun -np 18 pw.x -nk
> 18....) is an example of what I mean by "type of parallelization": there
> are many different parallelization levels in QE. This is on k-points (and
> runs faster in this case on less processors than parallelization on plane
> waves).
>
> Paolo
>
> On Sun, May 15, 2016 at 6:01 PM, Chong Wang <ch-wang at outlook.com> wrote:
>
>> Hi,
>>
>>
>> I have done more test:
>>
>> 1. intel mpi 2015 yields segment fault
>>
>> 2. intel mpi 2013 yields the same error here
>>
>> Did I do something wrong with compiling? Here's my make.sys:
>>
>>
>> # make.sys.  Generated from make.sys.in by configure.
>>
>>
>> # compilation rules
>>
>>
>> .SUFFIXES :
>>
>> .SUFFIXES : .o .c .f .f90
>>
>>
>> # most fortran compilers can directly preprocess c-like directives: use
>>
>> # $(MPIF90) $(F90FLAGS) -c $<
>>
>> # if explicit preprocessing by the C preprocessor is needed, use:
>>
>> # $(CPP) $(CPPFLAGS) $< -o $*.F90
>>
>> # $(MPIF90) $(F90FLAGS) -c $*.F90 -o $*.o
>>
>> # remember the tabulator in the first column !!!
>>
>>
>> .f90.o:
>>
>> $(MPIF90) $(F90FLAGS) -c $<
>>
>>
>> # .f.o and .c.o: do not modify
>>
>>
>> .f.o:
>>
>> $(F77) $(FFLAGS) -c $<
>>
>>
>> .c.o:
>>
>> $(CC) $(CFLAGS)  -c $<
>>
>>
>>
>>
>> # Top QE directory, not used in QE but useful for linking QE libs with
>> plugins
>>
>> # The following syntax should always point to TOPDIR:
>>
>> #   $(dir $(abspath $(filter %make.sys,$(MAKEFILE_LIST))))
>>
>>
>> TOPDIR = /home/wangc/temp/espresso-5.4.0
>>
>>
>> # DFLAGS  = precompilation options (possible arguments to -D and -U)
>>
>> #           used by the C compiler and preprocessor
>>
>> # FDFLAGS = as DFLAGS, for the f90 compiler
>>
>> # See include/defs.h.README for a list of options and their meaning
>>
>> # With the exception of IBM xlf, FDFLAGS = $(DFLAGS)
>>
>> # For IBM xlf, FDFLAGS is the same as DFLAGS with separating commas
>>
>>
>> # MANUAL_DFLAGS  = additional precompilation option(s), if desired
>>
>> #                  BEWARE: it does not work for IBM xlf! Manually edit
>> FDFLAGS
>>
>> MANUAL_DFLAGS  =
>>
>> DFLAGS         =  -D__GFORTRAN -D__STD_F95 -D__DFTI -D__MPI -D__PARA
>> -D__SCALAPACK
>>
>> FDFLAGS        = $(DFLAGS) $(MANUAL_DFLAGS)
>>
>>
>> # IFLAGS = how to locate directories with *.h or *.f90 file to be included
>>
>> #          typically -I../include -I/some/other/directory/
>>
>> #          the latter contains .e.g. files needed by FFT libraries
>>
>>
>> IFLAGS         = -I../include
>> -I/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include
>>
>>
>> # MOD_FLAGS = flag used by f90 compiler to locate modules
>>
>> # Each Makefile defines the list of needed modules in MODFLAGS
>>
>>
>> MOD_FLAG      = -I
>>
>>
>> # Compilers: fortran-90, fortran-77, C
>>
>> # If a parallel compilation is desired, MPIF90 should be a fortran-90
>>
>> # compiler that produces executables for parallel execution using MPI
>>
>> # (such as for instance mpif90, mpf90, mpxlf90,...);
>>
>> # otherwise, an ordinary fortran-90 compiler (f90, g95, xlf90, ifort,...)
>>
>> # If you have a parallel machine but no suitable candidate for MPIF90,
>>
>> # try to specify the directory containing "mpif.h" in IFLAGS
>>
>> # and to specify the location of MPI libraries in MPI_LIBS
>>
>>
>> MPIF90         = mpif90
>>
>> #F90           = gfortran
>>
>> CC             = cc
>>
>> F77            = gfortran
>>
>>
>> # C preprocessor and preprocessing flags - for explicit preprocessing,
>>
>> # if needed (see the compilation rules above)
>>
>> # preprocessing flags must include DFLAGS and IFLAGS
>>
>>
>> CPP            = cpp
>>
>> CPPFLAGS       = -P -C -traditional $(DFLAGS) $(IFLAGS)
>>
>>
>> # compiler flags: C, F90, F77
>>
>> # C flags must include DFLAGS and IFLAGS
>>
>> # F90 flags must include MODFLAGS, IFLAGS, and FDFLAGS with appropriate
>> syntax
>>
>>
>> CFLAGS         = -O3 $(DFLAGS) $(IFLAGS)
>>
>> F90FLAGS       = $(FFLAGS) -x f95-cpp-input $(FDFLAGS) $(IFLAGS)
>> $(MODFLAGS)
>>
>> FFLAGS         = -O3 -g
>>
>>
>> # compiler flags without optimization for fortran-77
>>
>> # the latter is NEEDED to properly compile dlamch.f, used by lapack
>>
>>
>> FFLAGS_NOOPT   = -O0 -g
>>
>>
>> # compiler flag needed by some compilers when the main program is not
>> fortran
>>
>> # Currently used for Yambo
>>
>>
>> FFLAGS_NOMAIN   =
>>
>>
>> # Linker, linker-specific flags (if any)
>>
>> # Typically LD coincides with F90 or MPIF90, LD_LIBS is empty
>>
>>
>> LD             = mpif90
>>
>> LDFLAGS        =  -g -pthread
>>
>> LD_LIBS        =
>>
>>
>> # External Libraries (if any) : blas, lapack, fft, MPI
>>
>>
>> # If you have nothing better, use the local copy :
>>
>> # BLAS_LIBS = /your/path/to/espresso/BLAS/blas.a
>>
>> # BLAS_LIBS_SWITCH = internal
>>
>>
>> BLAS_LIBS      =   -lmkl_gf_lp64  -lmkl_sequential -lmkl_core
>>
>> BLAS_LIBS_SWITCH = external
>>
>>
>> # If you have nothing better, use the local copy :
>>
>> # LAPACK_LIBS = /your/path/to/espresso/lapack-3.2/lapack.a
>>
>> # LAPACK_LIBS_SWITCH = internal
>>
>> # For IBM machines with essl (-D__ESSL): load essl BEFORE lapack !
>>
>> # remember that LAPACK_LIBS precedes BLAS_LIBS in loading order
>>
>>
>> LAPACK_LIBS    =
>>
>> LAPACK_LIBS_SWITCH = external
>>
>>
>> ELPA_LIBS_SWITCH = disabled
>>
>> SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
>>
>>
>> # nothing needed here if the the internal copy of FFTW is compiled
>>
>> # (needs -D__FFTW in DFLAGS)
>>
>>
>> FFT_LIBS       =
>>
>>
>> # For parallel execution, the correct path to MPI libraries must
>>
>> # be specified in MPI_LIBS (except for IBM if you use mpxlf)
>>
>>
>> MPI_LIBS       =
>>
>>
>> # IBM-specific: MASS libraries, if available and if -D__MASS is defined
>> in FDFLAGS
>>
>>
>> MASS_LIBS      =
>>
>>
>> # ar command and flags - for most architectures: AR = ar, ARFLAGS = ruv
>>
>>
>> AR             = ar
>>
>> ARFLAGS        = ruv
>>
>>
>> # ranlib command. If ranlib is not needed (it isn't in most cases) use
>>
>> # RANLIB = echo
>>
>>
>> RANLIB         = ranlib
>>
>>
>> # all internal and external libraries - do not modify
>>
>>
>> FLIB_TARGETS   = all
>>
>>
>> LIBOBJS        = ../clib/clib.a ../iotk/src/libiotk.a
>>
>> LIBS           = $(SCALAPACK_LIBS) $(LAPACK_LIBS) $(FFT_LIBS)
>> $(BLAS_LIBS) $(MPI_LIBS) $(MASS_LIBS) $(LD_LIBS)
>>
>>
>> # wget or curl - useful to download from network
>>
>> WGET = wget -O
>>
>>
>> # Install directory - not currently used
>>
>> PREFIX = /usr/local
>>
>> Cheers!
>>
>>
>> Chong Wang
>> ------------------------------
>> *From:* pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on
>> behalf of Paolo Giannozzi <p.giannozzi at gmail.com>
>> *Sent:* Sunday, May 15, 2016 8:28:26 PM
>>
>> *To:* PWSCF Forum
>> *Subject:* Re: [Pw_forum] mpi error using pw.x
>>
>> It looks like a compiler/mpi bug, since there is nothing special in your
>> input and in your execution, unless you find evidence that the problem is
>> reproducible on other compiler/mpi versions.
>>
>> Paolo
>>
>> On Sun, May 15, 2016 at 10:11 AM, Chong Wang <ch-wang at outlook.com> wrote:
>>
>>> Hi,
>>>
>>>
>>> Thank you for replying.
>>>
>>>
>>> More details:
>>>
>>>
>>> 1. input data:
>>>
>>> &control
>>>     calculation='scf'
>>>     restart_mode='from_scratch',
>>>     pseudo_dir = '../pot/',
>>>     outdir='./out/'
>>>     prefix='BaTiO3'
>>> /
>>> &system
>>>     nbnd = 48
>>>     ibrav = 0, nat = 5, ntyp = 3
>>>     ecutwfc = 50
>>>     occupations='smearing', smearing='gaussian', degauss=0.02
>>> /
>>> &electrons
>>>     conv_thr = 1.0e-8
>>> /
>>> ATOMIC_SPECIES
>>>  Ba 137.327 Ba.pbe-mt_fhi.UPF
>>>  Ti 204.380 Ti.pbe-mt_fhi.UPF
>>>  O  15.999  O.pbe-mt_fhi.UPF
>>> ATOMIC_POSITIONS
>>>  Ba 0.0000000000000000   0.0000000000000000   0.0000000000000000
>>>  Ti 0.5000000000000000   0.5000000000000000   0.4819999933242795
>>>  O  0.5000000000000000   0.5000000000000000   0.0160000007599592
>>>  O  0.5000000000000000  -0.0000000000000000   0.5149999856948849
>>>  O  0.0000000000000000   0.5000000000000000   0.5149999856948849
>>> K_POINTS (automatic)
>>> 11 11 11 0 0 0
>>> CELL_PARAMETERS {angstrom}
>>> 3.999800000000001       0.000000000000000       0.000000000000000
>>> 0.000000000000000       3.999800000000001       0.000000000000000
>>> 0.000000000000000       0.000000000000000       4.018000000000000
>>>
>>> 2. number of processors:
>>> I tested 24 cores and 8 cores, and both yield the same result.
>>>
>>> 3. type of parallelization:
>>> I don't know your meaning. I execute pw.x by:
>>> mpirun  -np 24 pw.x < BTO.scf.in >> output
>>>
>>> 'which mpirun' output:
>>>
>>> /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun
>>>
>>> 4. when the error occurs:
>>> in the middle of the run. The last a few lines of the output is
>>>      total cpu time spent up to now is       32.9 secs
>>>
>>>      total energy              =    -105.97885119 Ry
>>>      Harris-Foulkes estimate   =    -105.99394457 Ry
>>>      estimated scf accuracy    <       0.03479229 Ry
>>>
>>>      iteration #  7     ecut=    50.00 Ry     beta=0.70
>>>      Davidson diagonalization with overlap
>>>      ethr =  1.45E-04,  avg # of iterations =  2.7
>>>
>>>      total cpu time spent up to now is       37.3 secs
>>>
>>>      total energy              =    -105.99039982 Ry
>>>      Harris-Foulkes estimate   =    -105.99025175 Ry
>>>      estimated scf accuracy    <       0.00927902 Ry
>>>
>>>      iteration #  8     ecut=    50.00 Ry     beta=0.70
>>>      Davidson diagonalization with overlap
>>>
>>> 5. Error message:
>>> Something like:
>>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>>> remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed
>>> PMPI_Cart_sub(178)...................:
>>> MPIR_Comm_split_impl(270)............:
>>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>>> free on this process; ignore_id=0)
>>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>>> remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed
>>> PMPI_Cart_sub(178)...................:
>>>
>>> Cheers!
>>>
>>> Chong
>>> ------------------------------
>>> *From:* pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on
>>> behalf of Paolo Giannozzi <p.giannozzi at gmail.com>
>>> *Sent:* Sunday, May 15, 2016 3:43 PM
>>> *To:* PWSCF Forum
>>> *Subject:* Re: [Pw_forum] mpi error using pw.x
>>>
>>> Please tell us what is wrong and we will fix it.
>>>
>>> Seriously: nobody can answer your question unless you specify, as a
>>> strict minimum, input data, number of processors and type of
>>> parallelization that trigger the error, and where the error occurs (at
>>> startup, later, in the middle of the run, ...).
>>>
>>> Paolo
>>>
>>> On Sun, May 15, 2016 at 7:50 AM, Chong Wang <ch-wang at outlook.com> wrote:
>>>
>>>> I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3.
>>>>
>>>> However, when I ran pw.x the following errors were reported:
>>>>
>>>> ...
>>>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>>>> free on this process; ignore_id=0)
>>>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>>>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>>>> remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed
>>>> PMPI_Cart_sub(178)...................:
>>>> MPIR_Comm_split_impl(270)............:
>>>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>>>> free on this process; ignore_id=0)
>>>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>>>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>>>> remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed
>>>> PMPI_Cart_sub(178)...................:
>>>> MPIR_Comm_split_impl(270)............:
>>>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>>>> free on this process; ignore_id=0)
>>>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>>>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>>>> remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed
>>>> PMPI_Cart_sub(178)...................:
>>>> MPIR_Comm_split_impl(270)............:
>>>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>>>> free on this process; ignore_id=0)
>>>>
>>>> I googled and found out this might be caused by hitting os limits of
>>>> number of opened files. However, After I increased number of opened files
>>>> per process from 1024 to 40960, the error persists.
>>>>
>>>>
>>>> What's wrong here?
>>>>
>>>>
>>>> Chong Wang
>>>>
>>>> Ph. D. candidate
>>>>
>>>> Institute for Advanced Study, Tsinghua University, Beijing, 100084
>>>>
>>>> _______________________________________________
>>>> Pw_forum mailing list
>>>> Pw_forum at pwscf.org
>>>> http://pwscf.org/mailman/listinfo/pw_forum
>>>>
>>>
>>>
>>>
>>> --
>>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>>> Phone +39-0432-558216, fax +39-0432-558222
>>>
>>>
>>> _______________________________________________
>>> Pw_forum mailing list
>>> Pw_forum at pwscf.org
>>> http://pwscf.org/mailman/listinfo/pw_forum
>>>
>>
>>
>>
>> --
>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
>> Phone +39-0432-558216, fax +39-0432-558222
>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160516/5ad5416d/attachment.html>


More information about the users mailing list