[Pw_forum] mpi error using pw.x
Chong Wang
ch-wang at outlook.com
Sun May 15 10:11:48 CEST 2016
Hi,
Thank you for replying.
More details:
1. input data:
&control
calculation='scf'
restart_mode='from_scratch',
pseudo_dir = '../pot/',
outdir='./out/'
prefix='BaTiO3'
/
&system
nbnd = 48
ibrav = 0, nat = 5, ntyp = 3
ecutwfc = 50
occupations='smearing', smearing='gaussian', degauss=0.02
/
&electrons
conv_thr = 1.0e-8
/
ATOMIC_SPECIES
Ba 137.327 Ba.pbe-mt_fhi.UPF
Ti 204.380 Ti.pbe-mt_fhi.UPF
O 15.999 O.pbe-mt_fhi.UPF
ATOMIC_POSITIONS
Ba 0.0000000000000000 0.0000000000000000 0.0000000000000000
Ti 0.5000000000000000 0.5000000000000000 0.4819999933242795
O 0.5000000000000000 0.5000000000000000 0.0160000007599592
O 0.5000000000000000 -0.0000000000000000 0.5149999856948849
O 0.0000000000000000 0.5000000000000000 0.5149999856948849
K_POINTS (automatic)
11 11 11 0 0 0
CELL_PARAMETERS {angstrom}
3.999800000000001 0.000000000000000 0.000000000000000
0.000000000000000 3.999800000000001 0.000000000000000
0.000000000000000 0.000000000000000 4.018000000000000
2. number of processors:
I tested 24 cores and 8 cores, and both yield the same result.
3. type of parallelization:
I don't know your meaning. I execute pw.x by:
mpirun -np 24 pw.x < BTO.scf.in >> output
'which mpirun' output:
/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun
4. when the error occurs:
in the middle of the run. The last a few lines of the output is
total cpu time spent up to now is 32.9 secs
total energy = -105.97885119 Ry
Harris-Foulkes estimate = -105.99394457 Ry
estimated scf accuracy < 0.03479229 Ry
iteration # 7 ecut= 50.00 Ry beta=0.70
Davidson diagonalization with overlap
ethr = 1.45E-04, avg # of iterations = 2.7
total cpu time spent up to now is 37.3 secs
total energy = -105.99039982 Ry
Harris-Foulkes estimate = -105.99025175 Ry
estimated scf accuracy < 0.00927902 Ry
iteration # 8 ecut= 50.00 Ry beta=0.70
Davidson diagonalization with overlap
5. Error message:
Something like:
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed
PMPI_Cart_sub(178)...................:
Cheers!
Chong
________________________________
From: pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf of Paolo Giannozzi <p.giannozzi at gmail.com>
Sent: Sunday, May 15, 2016 3:43 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] mpi error using pw.x
Please tell us what is wrong and we will fix it.
Seriously: nobody can answer your question unless you specify, as a strict minimum, input data, number of processors and type of parallelization that trigger the error, and where the error occurs (at startup, later, in the middle of the run, ...).
Paolo
On Sun, May 15, 2016 at 7:50 AM, Chong Wang <ch-wang at outlook.com<mailto:ch-wang at outlook.com>> wrote:
I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3.
However, when I ran pw.x the following errors were reported:
...
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
I googled and found out this might be caused by hitting os limits of number of opened files. However, After I increased number of opened files per process from 1024 to 40960, the error persists.
What's wrong here?
Chong Wang
Ph. D. candidate
Institute for Advanced Study, Tsinghua University, Beijing, 100084
_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org<mailto:Pw_forum at pwscf.org>
http://pwscf.org/mailman/listinfo/pw_forum
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160515/f3e1d49b/attachment.html>
More information about the users
mailing list