[Pw_forum] mpi error using pw.x

Paolo Giannozzi p.giannozzi at gmail.com
Sun May 15 14:28:26 CEST 2016


It looks like a compiler/mpi bug, since there is nothing special in your
input and in your execution, unless you find evidence that the problem is
reproducible on other compiler/mpi versions.

Paolo

On Sun, May 15, 2016 at 10:11 AM, Chong Wang <ch-wang at outlook.com> wrote:

> Hi,
>
>
> Thank you for replying.
>
>
> More details:
>
>
> 1. input data:
>
> &control
>     calculation='scf'
>     restart_mode='from_scratch',
>     pseudo_dir = '../pot/',
>     outdir='./out/'
>     prefix='BaTiO3'
> /
> &system
>     nbnd = 48
>     ibrav = 0, nat = 5, ntyp = 3
>     ecutwfc = 50
>     occupations='smearing', smearing='gaussian', degauss=0.02
> /
> &electrons
>     conv_thr = 1.0e-8
> /
> ATOMIC_SPECIES
>  Ba 137.327 Ba.pbe-mt_fhi.UPF
>  Ti 204.380 Ti.pbe-mt_fhi.UPF
>  O  15.999  O.pbe-mt_fhi.UPF
> ATOMIC_POSITIONS
>  Ba 0.0000000000000000   0.0000000000000000   0.0000000000000000
>  Ti 0.5000000000000000   0.5000000000000000   0.4819999933242795
>  O  0.5000000000000000   0.5000000000000000   0.0160000007599592
>  O  0.5000000000000000  -0.0000000000000000   0.5149999856948849
>  O  0.0000000000000000   0.5000000000000000   0.5149999856948849
> K_POINTS (automatic)
> 11 11 11 0 0 0
> CELL_PARAMETERS {angstrom}
> 3.999800000000001       0.000000000000000       0.000000000000000
> 0.000000000000000       3.999800000000001       0.000000000000000
> 0.000000000000000       0.000000000000000       4.018000000000000
>
> 2. number of processors:
> I tested 24 cores and 8 cores, and both yield the same result.
>
> 3. type of parallelization:
> I don't know your meaning. I execute pw.x by:
> mpirun  -np 24 pw.x < BTO.scf.in >> output
>
> 'which mpirun' output:
> /opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun
>
> 4. when the error occurs:
> in the middle of the run. The last a few lines of the output is
>      total cpu time spent up to now is       32.9 secs
>
>      total energy              =    -105.97885119 Ry
>      Harris-Foulkes estimate   =    -105.99394457 Ry
>      estimated scf accuracy    <       0.03479229 Ry
>
>      iteration #  7     ecut=    50.00 Ry     beta=0.70
>      Davidson diagonalization with overlap
>      ethr =  1.45E-04,  avg # of iterations =  2.7
>
>      total cpu time spent up to now is       37.3 secs
>
>      total energy              =    -105.99039982 Ry
>      Harris-Foulkes estimate   =    -105.99025175 Ry
>      estimated scf accuracy    <       0.00927902 Ry
>
>      iteration #  8     ecut=    50.00 Ry     beta=0.70
>      Davidson diagonalization with overlap
>
> 5. Error message:
> Something like:
> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
> remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed
> PMPI_Cart_sub(178)...................:
> MPIR_Comm_split_impl(270)............:
> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
> free on this process; ignore_id=0)
> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
> remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed
> PMPI_Cart_sub(178)...................:
>
> Cheers!
>
> Chong
> ------------------------------
> *From:* pw_forum-bounces at pwscf.org <pw_forum-bounces at pwscf.org> on behalf
> of Paolo Giannozzi <p.giannozzi at gmail.com>
> *Sent:* Sunday, May 15, 2016 3:43 PM
> *To:* PWSCF Forum
> *Subject:* Re: [Pw_forum] mpi error using pw.x
>
> Please tell us what is wrong and we will fix it.
>
> Seriously: nobody can answer your question unless you specify, as a strict
> minimum, input data, number of processors and type of parallelization that
> trigger the error, and where the error occurs (at startup, later, in the
> middle of the run, ...).
>
> Paolo
>
> On Sun, May 15, 2016 at 7:50 AM, Chong Wang <ch-wang at outlook.com> wrote:
>
>> I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3.
>>
>> However, when I ran pw.x the following errors were reported:
>>
>> ...
>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>> free on this process; ignore_id=0)
>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>> remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed
>> PMPI_Cart_sub(178)...................:
>> MPIR_Comm_split_impl(270)............:
>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>> free on this process; ignore_id=0)
>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>> remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed
>> PMPI_Cart_sub(178)...................:
>> MPIR_Comm_split_impl(270)............:
>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>> free on this process; ignore_id=0)
>> Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
>> PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3,
>> remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed
>> PMPI_Cart_sub(178)...................:
>> MPIR_Comm_split_impl(270)............:
>> MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384
>> free on this process; ignore_id=0)
>>
>> I googled and found out this might be caused by hitting os limits of
>> number of opened files. However, After I increased number of opened files
>> per process from 1024 to 40960, the error persists.
>>
>>
>> What's wrong here?
>>
>>
>> Chong Wang
>>
>> Ph. D. candidate
>>
>> Institute for Advanced Study, Tsinghua University, Beijing, 100084
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160515/80e448a1/attachment.html>


More information about the users mailing list