[Pw_forum] mpi error using pw.x
Paolo Giannozzi
p.giannozzi at gmail.com
Sun May 15 14:28:26 CEST 2016
It looks like a compiler/mpi bug, since there is nothing special in your
input and in your execution, unless you find evidence that the problem is
reproducible on other compiler/mpi versions.
Paolo
On Sun, May 15, 2016 at 10:11 AM, Chong Wang <ch-wang at outlook.com> wrote:
> Hi,
>
>
> Thank you for replying.
>
>
> More details:
>
>
> 1. input data:
>
> &control
> calculation='scf'
> restart_mode='from_scratch',
> pseudo_dir = '../pot/',
> outdir='./out/'
> prefix='BaTiO3'
> /
> &system
> nbnd = 48
> ibrav = 0, nat = 5, ntyp = 3
> ecutwfc = 50
> occupations='smearing', smearing='gaussian', degauss=0.02
> /
> &electrons
> conv_thr = 1.0e-8
> /
> ATOMIC_SPECIES
> Ba 137.327 Ba.pbe-mt_fhi.UPF
> Ti 204.380 Ti.pbe-mt_fhi.UPF
> O 15.999 O.pbe-mt_fhi.UPF
> ATOMIC_POSITIONS
> Ba 0.0000000000000000 0.0000000000000000 0.0000000000000000
> Ti 0.5000000000000000 0.5000000000000000 0.4819999933242795
> O 0.5000000000000000 0.5000000000000000 0.0160000007599592
> O 0.5000000000000000 -0.0000000000000000 0.5149999856948849
> O 0.0000000000000000 0.5000000000000000 0.5149999856948849
> K_POINTS (automatic)
> 11 11 11 0 0 0
> CELL_PARAMETERS {angstrom}
> 3.999800000000001 0.000000000000000 0.000000000000000
> 0.000000000000000 3.999800000000001 0.000000000000000
> 0.000000000000000 0.000000000000000 4.018000000000000
>
> 2. number of processors:
> I tested 24 cores and 8 cores, and both yield the same result.
>
> 3. type of parallelization:
> I don't know your meaning. I execute pw.x by:
> mpirun -np 24 pw.x < BTO.scf.in >> output
>
> 'which mpirun' output:
> /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun
>
> 4. when the error occurs:
> in the middle of the run. The last a few lines of the output is
> total cpu time spent up to now is 32.9 secs
>
> total energy = -105.97885119 Ry
> Harris-Foulkes estimate = -105.99394457 Ry
> estimated scf accuracy < 0.03479229 Ry
>
> iteration # 7 ecut= 50.00 Ry beta=0.70
> Davidson diagonalization with overlap
> ethr = 1.45E-04, avg # of iterations = 2.7
>
> total cpu time spent up to now is 37.3 secs
>
> total energy = -105.99039982 Ry
> Harris-Foulkes estimate = -105.99025175 Ry
> estimated scf accuracy < 0.00927902 Ry
>
> iteration # 8 ecut= 50.00 Ry beta=0.70
> Davidson diagonalization with overlap
>
> 5. Error message:
> Something like:
> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
> remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed
> PMPI_Cart_sub(178)...................:
> MPIR_Comm_split_impl(270)............:
> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
> free on this process; ignore_id=0)
> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
> remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed
> PMPI_Cart_sub(178)...................:
>
> Cheers!
>
> Chong
> ------------------------------
> *From:* pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf
> of Paolo Giannozzi <p.giannozzi at gmail.com>
> *Sent:* Sunday, May 15, 2016 3:43 PM
> *To:* PWSCF Forum
> *Subject:* Re: [Pw_forum] mpi error using pw.x
>
> Please tell us what is wrong and we will fix it.
>
> Seriously: nobody can answer your question unless you specify, as a strict
> minimum, input data, number of processors and type of parallelization that
> trigger the error, and where the error occurs (at startup, later, in the
> middle of the run, ...).
>
> Paolo
>
> On Sun, May 15, 2016 at 7:50 AM, Chong Wang <ch-wang at outlook.com> wrote:
>
>> I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3.
>>
>> However, when I ran pw.x the following errors were reported:
>>
>> ...
>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>> free on this process; ignore_id=0)
>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>> remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed
>> PMPI_Cart_sub(178)...................:
>> MPIR_Comm_split_impl(270)............:
>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>> free on this process; ignore_id=0)
>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>> remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed
>> PMPI_Cart_sub(178)...................:
>> MPIR_Comm_split_impl(270)............:
>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>> free on this process; ignore_id=0)
>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>> remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed
>> PMPI_Cart_sub(178)...................:
>> MPIR_Comm_split_impl(270)............:
>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>> free on this process; ignore_id=0)
>>
>> I googled and found out this might be caused by hitting os limits of
>> number of opened files. However, After I increased number of opened files
>> per process from 1024 to 40960, the error persists.
>>
>>
>> What's wrong here?
>>
>>
>> Chong Wang
>>
>> Ph. D. candidate
>>
>> Institute for Advanced Study, Tsinghua University, Beijing, 100084
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160515/80e448a1/attachment.html>
More information about the users
mailing list