[Pw_forum] mpi error using pw.x
Chong Wang
ch-wang at outlook.com
Sun May 15 18:01:37 CEST 2016
Hi,
I have done more test:
1. intel mpi 2015 yields segment fault
2. intel mpi 2013 yields the same error here
Did I do something wrong with compiling? Here's my make.sys:
# make.sys. Generated from make.sys.in by configure.
# compilation rules
.SUFFIXES :
.SUFFIXES : .o .c .f .f90
# most fortran compilers can directly preprocess c-like directives: use
# $(MPIF90) $(F90FLAGS) -c $<
# if explicit preprocessing by the C preprocessor is needed, use:
# $(CPP) $(CPPFLAGS) $< -o $*.F90
# $(MPIF90) $(F90FLAGS) -c $*.F90 -o $*.o
# remember the tabulator in the first column !!!
.f90.o:
$(MPIF90) $(F90FLAGS) -c $<
# .f.o and .c.o: do not modify
.f.o:
$(F77) $(FFLAGS) -c $<
.c.o:
$(CC) $(CFLAGS) -c $<
# Top QE directory, not used in QE but useful for linking QE libs with plugins
# The following syntax should always point to TOPDIR:
# $(dir $(abspath $(filter %make.sys,$(MAKEFILE_LIST))))
TOPDIR = /home/wangc/temp/espresso-5.4.0
# DFLAGS = precompilation options (possible arguments to -D and -U)
# used by the C compiler and preprocessor
# FDFLAGS = as DFLAGS, for the f90 compiler
# See include/defs.h.README for a list of options and their meaning
# With the exception of IBM xlf, FDFLAGS = $(DFLAGS)
# For IBM xlf, FDFLAGS is the same as DFLAGS with separating commas
# MANUAL_DFLAGS = additional precompilation option(s), if desired
# BEWARE: it does not work for IBM xlf! Manually edit FDFLAGS
MANUAL_DFLAGS =
DFLAGS = -D__GFORTRAN -D__STD_F95 -D__DFTI -D__MPI -D__PARA -D__SCALAPACK
FDFLAGS = $(DFLAGS) $(MANUAL_DFLAGS)
# IFLAGS = how to locate directories with *.h or *.f90 file to be included
# typically -I../include -I/some/other/directory/
# the latter contains .e.g. files needed by FFT libraries
IFLAGS = -I../include -I/opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/include
# MOD_FLAGS = flag used by f90 compiler to locate modules
# Each Makefile defines the list of needed modules in MODFLAGS
MOD_FLAG = -I
# Compilers: fortran-90, fortran-77, C
# If a parallel compilation is desired, MPIF90 should be a fortran-90
# compiler that produces executables for parallel execution using MPI
# (such as for instance mpif90, mpf90, mpxlf90,...);
# otherwise, an ordinary fortran-90 compiler (f90, g95, xlf90, ifort,...)
# If you have a parallel machine but no suitable candidate for MPIF90,
# try to specify the directory containing "mpif.h" in IFLAGS
# and to specify the location of MPI libraries in MPI_LIBS
MPIF90 = mpif90
#F90 = gfortran
CC = cc
F77 = gfortran
# C preprocessor and preprocessing flags - for explicit preprocessing,
# if needed (see the compilation rules above)
# preprocessing flags must include DFLAGS and IFLAGS
CPP = cpp
CPPFLAGS = -P -C -traditional $(DFLAGS) $(IFLAGS)
# compiler flags: C, F90, F77
# C flags must include DFLAGS and IFLAGS
# F90 flags must include MODFLAGS, IFLAGS, and FDFLAGS with appropriate syntax
CFLAGS = -O3 $(DFLAGS) $(IFLAGS)
F90FLAGS = $(FFLAGS) -x f95-cpp-input $(FDFLAGS) $(IFLAGS) $(MODFLAGS)
FFLAGS = -O3 -g
# compiler flags without optimization for fortran-77
# the latter is NEEDED to properly compile dlamch.f, used by lapack
FFLAGS_NOOPT = -O0 -g
# compiler flag needed by some compilers when the main program is not fortran
# Currently used for Yambo
FFLAGS_NOMAIN =
# Linker, linker-specific flags (if any)
# Typically LD coincides with F90 or MPIF90, LD_LIBS is empty
LD = mpif90
LDFLAGS = -g -pthread
LD_LIBS =
# External Libraries (if any) : blas, lapack, fft, MPI
# If you have nothing better, use the local copy :
# BLAS_LIBS = /your/path/to/espresso/BLAS/blas.a
# BLAS_LIBS_SWITCH = internal
BLAS_LIBS = -lmkl_gf_lp64 -lmkl_sequential -lmkl_core
BLAS_LIBS_SWITCH = external
# If you have nothing better, use the local copy :
# LAPACK_LIBS = /your/path/to/espresso/lapack-3.2/lapack.a
# LAPACK_LIBS_SWITCH = internal
# For IBM machines with essl (-D__ESSL): load essl BEFORE lapack !
# remember that LAPACK_LIBS precedes BLAS_LIBS in loading order
LAPACK_LIBS =
LAPACK_LIBS_SWITCH = external
ELPA_LIBS_SWITCH = disabled
SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64
# nothing needed here if the the internal copy of FFTW is compiled
# (needs -D__FFTW in DFLAGS)
FFT_LIBS =
# For parallel execution, the correct path to MPI libraries must
# be specified in MPI_LIBS (except for IBM if you use mpxlf)
MPI_LIBS =
# IBM-specific: MASS libraries, if available and if -D__MASS is defined in FDFLAGS
MASS_LIBS =
# ar command and flags - for most architectures: AR = ar, ARFLAGS = ruv
AR = ar
ARFLAGS = ruv
# ranlib command. If ranlib is not needed (it isn't in most cases) use
# RANLIB = echo
RANLIB = ranlib
# all internal and external libraries - do not modify
FLIB_TARGETS = all
LIBOBJS = ../clib/clib.a ../iotk/src/libiotk.a
LIBS = $(SCALAPACK_LIBS) $(LAPACK_LIBS) $(FFT_LIBS) $(BLAS_LIBS) $(MPI_LIBS) $(MASS_LIBS) $(LD_LIBS)
# wget or curl - useful to download from network
WGET = wget -O
# Install directory - not currently used
PREFIX = /usr/local
Cheers!
Chong Wang
________________________________
From: pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf of Paolo Giannozzi <p.giannozzi at gmail.com>
Sent: Sunday, May 15, 2016 8:28:26 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] mpi error using pw.x
It looks like a compiler/mpi bug, since there is nothing special in your input and in your execution, unless you find evidence that the problem is reproducible on other compiler/mpi versions.
Paolo
On Sun, May 15, 2016 at 10:11 AM, Chong Wang <ch-wang at outlook.com<mailto:ch-wang at outlook.com>> wrote:
Hi,
Thank you for replying.
More details:
1. input data:
&control
calculation='scf'
restart_mode='from_scratch',
pseudo_dir = '../pot/',
outdir='./out/'
prefix='BaTiO3'
/
&system
nbnd = 48
ibrav = 0, nat = 5, ntyp = 3
ecutwfc = 50
occupations='smearing', smearing='gaussian', degauss=0.02
/
&electrons
conv_thr = 1.0e-8
/
ATOMIC_SPECIES
Ba 137.327 Ba.pbe-mt_fhi.UPF
Ti 204.380 Ti.pbe-mt_fhi.UPF
O 15.999 O.pbe-mt_fhi.UPF
ATOMIC_POSITIONS
Ba 0.0000000000000000 0.0000000000000000 0.0000000000000000
Ti 0.5000000000000000 0.5000000000000000 0.4819999933242795
O 0.5000000000000000 0.5000000000000000 0.0160000007599592
O 0.5000000000000000 -0.0000000000000000 0.5149999856948849
O 0.0000000000000000 0.5000000000000000 0.5149999856948849
K_POINTS (automatic)
11 11 11 0 0 0
CELL_PARAMETERS {angstrom}
3.999800000000001 0.000000000000000 0.000000000000000
0.000000000000000 3.999800000000001 0.000000000000000
0.000000000000000 0.000000000000000 4.018000000000000
2. number of processors:
I tested 24 cores and 8 cores, and both yield the same result.
3. type of parallelization:
I don't know your meaning. I execute pw.x by:
mpirun -np 24 pw.x < BTO.scf.in<http://BTO.scf.in> >> output
'which mpirun' output:
/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun
4. when the error occurs:
in the middle of the run. The last a few lines of the output is
total cpu time spent up to now is 32.9 secs
total energy = -105.97885119 Ry
Harris-Foulkes estimate = -105.99394457 Ry
estimated scf accuracy < 0.03479229 Ry
iteration # 7 ecut= 50.00 Ry beta=0.70
Davidson diagonalization with overlap
ethr = 1.45E-04, avg # of iterations = 2.7
total cpu time spent up to now is 37.3 secs
total energy = -105.99039982 Ry
Harris-Foulkes estimate = -105.99025175 Ry
estimated scf accuracy < 0.00927902 Ry
iteration # 8 ecut= 50.00 Ry beta=0.70
Davidson diagonalization with overlap
5. Error message:
Something like:
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed
PMPI_Cart_sub(178)...................:
Cheers!
Chong
________________________________
From: pw_forum-bounces at pwscf.org<mailto:pw_forum-bounces at pwscf.org> <pw_forum-bounces at pwscf.org<mailto:pw_forum-bounces at pwscf.org>> on behalf of Paolo Giannozzi <p.giannozzi at gmail.com<mailto:p.giannozzi at gmail.com>>
Sent: Sunday, May 15, 2016 3:43 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] mpi error using pw.x
Please tell us what is wrong and we will fix it.
Seriously: nobody can answer your question unless you specify, as a strict minimum, input data, number of processors and type of parallelization that trigger the error, and where the error occurs (at startup, later, in the middle of the run, ...).
Paolo
On Sun, May 15, 2016 at 7:50 AM, Chong Wang <ch-wang at outlook.com<mailto:ch-wang at outlook.com>> wrote:
I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3.
However, when I ran pw.x the following errors were reported:
...
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
I googled and found out this might be caused by hitting os limits of number of opened files. However, After I increased number of opened files per process from 1024 to 40960, the error persists.
What's wrong here?
Chong Wang
Ph. D. candidate
Institute for Advanced Study, Tsinghua University, Beijing, 100084
_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org<mailto:Pw_forum at pwscf.org>
http://pwscf.org/mailman/listinfo/pw_forum
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216<tel:%2B39-0432-558216>, fax +39-0432-558222<tel:%2B39-0432-558222>
_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org<mailto:Pw_forum at pwscf.org>
http://pwscf.org/mailman/listinfo/pw_forum
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160515/184ed1e4/attachment.html>
More information about the users
mailing list