<div dir="ltr"><div>It looks like a compiler/mpi bug, since there is nothing special in your input and in your execution, unless you find evidence that the problem is reproducible on other compiler/mpi versions.<br><br></div>Paolo<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, May 15, 2016 at 10:11 AM, Chong Wang <span dir="ltr"><<a href="mailto:ch-wang@outlook.com" target="_blank">ch-wang@outlook.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">




<div dir="ltr">
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<p>Hi,</p>
<p><br>
</p>
<p>Thank you for replying.</p>
<p><br>
</p>
<p>More details:</p>
<p><br>
</p>
<p>1. input data:</p>
<p></p>
<div>&control</div>
<div>    calculation='scf'</div>
<div>    restart_mode='from_scratch',</div>
<div>    pseudo_dir = '../pot/',</div>
<div>    outdir='./out/'</div>
<div>    prefix='BaTiO3'</div>
<div>/</div>
<div>&system</div>
<div>    nbnd = 48</div>
<div>    ibrav = 0, nat = 5, ntyp = 3</div>
<div>    ecutwfc = 50</div>
<div>    occupations='smearing', smearing='gaussian', degauss=0.02 </div>
<div>/</div>
<div>&electrons</div>
<div>    conv_thr = 1.0e-8</div>
<div>/</div>
<div>ATOMIC_SPECIES</div>
<div> Ba 137.327 Ba.pbe-mt_fhi.UPF</div>
<div> Ti 204.380 Ti.pbe-mt_fhi.UPF</div>
<div> O  15.999  O.pbe-mt_fhi.UPF</div>
<div>ATOMIC_POSITIONS</div>
<div> Ba 0.0000000000000000   0.0000000000000000   0.0000000000000000</div>
<div> Ti 0.5000000000000000   0.5000000000000000   0.4819999933242795</div>
<div> O  0.5000000000000000   0.5000000000000000   0.0160000007599592</div>
<div> O  0.5000000000000000  -0.0000000000000000   0.5149999856948849</div>
<div> O  0.0000000000000000   0.5000000000000000   0.5149999856948849</div>
<div>K_POINTS (automatic)</div>
<div>11 11 11 0 0 0</div>
<div>CELL_PARAMETERS {angstrom}</div>
<div>3.999800000000001       0.000000000000000       0.000000000000000</div>
<div>0.000000000000000       3.999800000000001       0.000000000000000</div>
<div>0.000000000000000       0.000000000000000       4.018000000000000</div>
<div><br>
</div>
<p></p>
2. number of processors:</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
I tested 24 cores and 8 cores, and both yield the same result.</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<br>
</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
3. <span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16px">
type of parallelization:</span></div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
I don't know your meaning. I execute pw.x by:</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<span>mpirun  -np 24 pw.x < <a href="http://BTO.scf.in" target="_blank">BTO.scf.in</a> >> output</span></div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<span><br>
</span></div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<span>'which mpirun' output:</span></div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<span>
<div>/opt/intel/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin/mpirun</div>
</span></div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<br>
</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<span></span>4. when the error occurs:</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
in the middle of the run. The last a few lines of the output is</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<div>     total cpu time spent up to now is       32.9 secs</div>
<div><br>
</div>
<div>     total energy              =    -105.97885119 Ry</div>
<div>     Harris-Foulkes estimate   =    -105.99394457 Ry</div>
<div>     estimated scf accuracy    <       0.03479229 Ry</div>
<div><br>
</div>
<div>     iteration #  7     ecut=    50.00 Ry     beta=0.70</div>
<div>     Davidson diagonalization with overlap</div>
<div>     ethr =  1.45E-04,  avg # of iterations =  2.7</div>
<div><br>
</div>
<div>     total cpu time spent up to now is       37.3 secs</div>
<div><br>
</div>
<div>     total energy              =    -105.99039982 Ry</div>
<div>     Harris-Foulkes estimate   =    -105.99025175 Ry</div>
<div>     estimated scf accuracy    <       0.00927902 Ry</div>
<div><br>
</div>
<div>     iteration #  8     ecut=    50.00 Ry     beta=0.70</div>
<div>     Davidson diagonalization with overlap</div>
<div><br>
</div>
</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
5. Error message:</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
Something like:</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<div>Fatal error in PMPI_Cart_sub: Other MPI error, error stack:</div>
<div>PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc03ae5f38, comm_new=0x7ffc03ae5e90) failed</div>
<div>PMPI_Cart_sub(178)...................: </div>
<div>MPIR_Comm_split_impl(270)............: </div>
<div>MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)</div>
<div>Fatal error in PMPI_Cart_sub: Other MPI error, error stack:</div>
<div>PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffd10080408, comm_new=0x7ffd10080360) failed</div>
<div>PMPI_Cart_sub(178)...................: </div>
<div><br>
</div>
Cheers!</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<br>
</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
Chong</div>
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<div style="color:rgb(0,0,0)">
<hr style="display:inline-block;width:98%">
<div dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> <a href="mailto:pw_forum-bounces@pwscf.org" target="_blank">pw_forum-bounces@pwscf.org</a> <<a href="mailto:pw_forum-bounces@pwscf.org" target="_blank">pw_forum-bounces@pwscf.org</a>> on behalf of Paolo Giannozzi <<a href="mailto:p.giannozzi@gmail.com" target="_blank">p.giannozzi@gmail.com</a>><br>
<b>Sent:</b> Sunday, May 15, 2016 3:43 PM<br>
<b>To:</b> PWSCF Forum<br>
<b>Subject:</b> Re: [Pw_forum] mpi error using pw.x</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>
<div>
<div>Please tell us what is wrong and we will fix it.<br>
<br>
</div>
Seriously: nobody can answer your question unless you specify, as a strict minimum, input data, number of processors and type of parallelization that trigger the error, and where the error occurs (at startup, later, in the middle of the run, ...).<br>
<br>
</div>
<div>Paolo<br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Sun, May 15, 2016 at 7:50 AM, Chong Wang <span dir="ltr">
<<a href="mailto:ch-wang@outlook.com" target="_blank">ch-wang@outlook.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div style="font-size:12pt;color:#000000;background-color:#ffffff;font-family:Calibri,Arial,Helvetica,sans-serif">
<p></p>
<div>I compiled quantum espresso 5.4 with intel mpi and mkl 2016 update 3.</div>
<div><br>
</div>
<div>However, when I ran pw.x the following errors were reported:</div>
<div><br>
</div>
<div>...</div>
<div>MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)</div>
<div>Fatal error in PMPI_Cart_sub: Other MPI error, error stack:</div>
<div>PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffde1391dd8, comm_new=0x7ffde1391d30) failed</div>
<div>PMPI_Cart_sub(178)...................: </div>
<div>MPIR_Comm_split_impl(270)............: </div>
<div>MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)</div>
<div>Fatal error in PMPI_Cart_sub: Other MPI error, error stack:</div>
<div>PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7ffc02ad7eb8, comm_new=0x7ffc02ad7e10) failed</div>
<div>PMPI_Cart_sub(178)...................: </div>
<div>MPIR_Comm_split_impl(270)............: </div>
<div>MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)</div>
<div>Fatal error in PMPI_Cart_sub: Other MPI error, error stack:</div>
<div>PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf3, remain_dims=0x7fffb24e60f8, comm_new=0x7fffb24e6050) failed</div>
<div>PMPI_Cart_sub(178)...................: </div>
<div>MPIR_Comm_split_impl(270)............: </div>
<div>MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)</div>
<br>
<p></p>
<p>I googled and found out this might be caused by hitting os limits of number of opened files. However, After I increased number of opened files per process from 1024 to 40960, the error persists.</p>
<p><br>
</p>
<p>What's wrong here?</p>
<p><br>
</p>
<p>Chong Wang</p>
<p>Ph. D. candidate</p>
<p>Institute for Advanced Study, Tsinghua University, Beijing, 100084</p>
</div>
</div>
<br>
_______________________________________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org" target="_blank">Pw_forum@pwscf.org</a><br>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" title="http://pwscf.org/mailman/listinfo/pw_forum
Ctrl+Click or tap to follow the link" target="_blank">http://pwscf.org/mailman/listinfo/pw_forum</a><span class="HOEnZb"><font color="#888888"><br>
</font></span></blockquote><span class="HOEnZb"><font color="#888888">
</font></span></div><span class="HOEnZb"><font color="#888888">
<br>
<br clear="all">
<br>
-- <br>
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>
Phone <a href="tel:%2B39-0432-558216" value="+390432558216" target="_blank">+39-0432-558216</a>, fax <a href="tel:%2B39-0432-558222" value="+390432558222" target="_blank">+39-0432-558222</a><br>
<br>
</div>
</div>
</div>
</div>
</div>
</font></span></div>
</div>
</div>
</div>
</div>
</div>

<br>_______________________________________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/listinfo/pw_forum</a><br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div>
</div>