[Pw_forum] mpi error using pw.x

Chong Wang ch-wang at outlook.com
Sun May 15 10:11:48 CEST 2016


Hi,


Thank you for replying.


More details:


1. input data:

&control
    calculation='scf'
    restart_mode='from_scratch',
    pseudo_dir = '../pot/',
    outdir='./out/'
    prefix='BaTiO3'
/
&system
    nbnd = 48
    ibrav = 0, nat = 5, ntyp = 3
    ecutwfc = 50
    occupations='smearing', smearing='gaussian', degauss=0.02
/
&electrons
    conv_thr = 1.0e-8
/
ATOMIC_SPECIES
 Ba 137.327 Ba.pbe-mt_fhi.UPF
 Ti 204.380 Ti.pbe-mt_fhi.UPF
 O  15.999  O.pbe-mt_fhi.UPF
ATOMIC_POSITIONS
 Ba 0.0000000000000000   0.0000000000000000   0.0000000000000000
 Ti 0.5000000000000000   0.5000000000000000   0.4819999933242795
 O  0.5000000000000000   0.5000000000000000   0.0160000007599592
 O  0.5000000000000000  -0.0000000000000000   0.5149999856948849
 O  0.0000000000000000   0.5000000000000000   0.5149999856948849
K_POINTS (automatic)
11 11 11 0 0 0
CELL_PARAMETERS {angstrom}
3.999800000000001       0.000000000000000       0.000000000000000
0.000000000000000       3.999800000000001       0.000000000000000
0.000000000000000       0.000000000000000       4.018000000000000


2. number of processors:
I tested 24 cores and 8 cores, and both yield the same result.

3. type of parallelization:
I don't know your meaning. I execute pw.x by:
mpirun  -np 24 pw.x < BTO.scf.in >> output

'which mpirun' output:
/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun

4. when the error occurs:
in the middle of the run. The last a few lines of the output is
     total cpu time spent up to now is       32.9 secs

     total energy              =    -105.97885119 Ry
     Harris-Foulkes estimate   =    -105.99394457 Ry
     estimated scf accuracy    <       0.03479229 Ry

     iteration #  7     ecut=    50.00 Ry     beta=0.70
     Davidson diagonalization with overlap
     ethr =  1.45E-04,  avg # of iterations =  2.7

     total cpu time spent up to now is       37.3 secs

     total energy              =    -105.99039982 Ry
     Harris-Foulkes estimate   =    -105.99025175 Ry
     estimated scf accuracy    <       0.00927902 Ry

     iteration #  8     ecut=    50.00 Ry     beta=0.70
     Davidson diagonalization with overlap

5. Error message:
Something like:
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed
PMPI_Cart_sub(178)...................:

Cheers!

Chong
________________________________
From: pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf of Paolo Giannozzi <p.giannozzi at gmail.com>
Sent: Sunday, May 15, 2016 3:43 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] mpi error using pw.x

Please tell us what is wrong and we will fix it.

Seriously: nobody can answer your question unless you specify, as a strict minimum, input data, number of processors and type of parallelization that trigger the error, and where the error occurs (at startup, later, in the middle of the run, ...).

Paolo

On Sun, May 15, 2016 at 7:50 AM, Chong Wang <ch-wang at outlook.com<mailto:ch-wang at outlook.com>> wrote:

I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3.

However, when I ran pw.x the following errors were reported:

...
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)


I googled and found out this might be caused by hitting os limits of number of opened files. However, After I increased number of opened files per process from 1024 to 40960, the error persists.


What's wrong here?


Chong Wang

Ph. D. candidate

Institute for Advanced Study, Tsinghua University, Beijing, 100084

_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org<mailto:Pw_forum at pwscf.org>
http://pwscf.org/mailman/listinfo/pw_forum



--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160515/f3e1d49b/attachment.html>


More information about the users mailing list