<div dir="ltr"><div>"underflows"? They should never be a problem, unless you instruct the compiler (by activating some obscure flag) to catch them.<br><br></div>Paolo<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Dec 1, 2016 at 4:48 PM, Sergi Vela <span dir="ltr"><<a href="mailto:sergi.vela@gmail.com" target="_blank">sergi.vela@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Dear Paolo, <br><br>I have some more details on the problem with DFT+U. The problem arises from underflows somewhere in QE code, hence the MPI_Bcast message described in previous emails. A systematic crash occurs for the attached input, at least, in versions 5.1.1, 5.2, 5.4 and 6.0. <br><br>According to the support team of HPC-GRNET, the problem is not related to MPI (no matter if IntelMPI or OpenMPI - various versions for both) and it is not related to BLAS libraries (MKL, OpenBLAS). For Intel compilers, the flag "-fp-model precise" seems to be necessary (at least for 5.2 and 5.4). In turn, GNU compilers work. They also notice the underflow (a message appears in the job file after completion), but it seems that they can handle them.<div><br></div><div>The attached input is just an example. Many other jobs of different systems have failed whereas other closely-related inputs have run without any problem. I have the impression that the underflow is not always occurring or, at least, is not always enough to crash the job.<br><br>Right now I'm extensively using version 5.1.1 compiled with GNU/4.9 compiler and it seems to work well.<br><br>That's all the info I can give you about the problem. I hope it may eventually help.<br><br>Bests,<br>Sergi<p style="color:rgb(51,51,51);font-family:arial,sans-serif;font-size:14px;margin:10px 0px 0px"><br></p></div></div><div class="gmail_extra"><br><div class="gmail_quote">2016-11-23 16:13 GMT+01:00 Sergi Vela <span dir="ltr"><<a href="mailto:sergi.vela@gmail.com" target="_blank">sergi.vela@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Dear Paolo, <div><br></div><div>Unfortunately, there's not much to report so far. Many "relax" jobs for a system of ca. 500 atoms (including Fe) fail giving the same message Davide reported long time ago:</div><div>_________________<br></div><div><br></div><div><div>Fatal error in PMPI_Bcast: Other MPI error, error stack:</div><div>PMPI_Bcast(2434)........: MPI_Bcast(buf=0x8b25e30, count=7220, MPI_DOUBLE_PRECISION, root=0, comm=0x84000007) failed</div><div>MPIR_Bcast_impl(1807)...:</div><div>MPIR_Bcast(1835)........:</div><div>I_MPIR_Bcast_intra(2016): Failure during collective</div><div>MPIR_Bcast_intra(1665)..: Failure during collective</div></div><div>_________________</div><div><br></div><div>It only occurs in some architectures. The same inputs work for me in 2 other machines, so it seems to be related to the compilation. The support team of the HPC center I'm working on is trying to identify the problem. Also, it seems to occur randomly. In the sense that for some DFT+U calculations of the same type (same cutoffs, pp's, system) there is no problem at all. </div><div><br></div><div>I'll try to be more helpful next time, and I'll keep you updated.</div><div><br></div><div>Bests,</div><div>Sergi</div></div><div class="m_-7887301464784408242HOEnZb"><div class="m_-7887301464784408242h5"><div class="gmail_extra"><br><div class="gmail_quote">2016-11-23 15:21 GMT+01:00 Paolo Giannozzi <span dir="ltr"><<a href="mailto:p.giannozzi@gmail.com" target="_blank">p.giannozzi@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Thank you, but unless an example demonstrating the problem is provided, or at least some information on where this message come from is supplied, there is close to nothing that can be done<br><br></div>Paolo<br></div><div class="gmail_extra"><div><div class="m_-7887301464784408242m_-5592783953945793590h5"><br><div class="gmail_quote">On Wed, Nov 23, 2016 at 10:05 AM, Sergi Vela <span dir="ltr"><<a href="mailto:sergi.vela@gmail.com" target="_blank">sergi.vela@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Dear Colleagues, <div><br></div><div>Just to report that I'm having exactly the same problem with DFT+U. The same message is appearing randomly only when I use the Hubbard term. I could test versions 5.2 and 6.0 and it occurs in both. </div><div><br></div><div>All my best,</div><div>Sergi</div><div class="gmail_extra"><br><div class="gmail_quote">2015-07-16 18:43 GMT+02:00 Paolo Giannozzi <span dir="ltr"><<a href="mailto:p.giannozzi@gmail.com" target="_blank">p.giannozzi@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>There are many well-known problems of DFT+U, but none that is known to crash jobs with an obscure message.<br></div><div class="gmail_extra"><br><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="direction:ltr;font-family:Tahoma;color:rgb(0,0,0);font-size:10pt">
Rank 21 [Thu Jul 16 15:51:04 2015] [c4-2c0s15n2] Fatal error in PMPI_Bcast: Message truncated, error stack:<br>
PMPI_Bcast(1615)..............<wbr>....: MPI_Bcast(buf=0x75265e0, count=160, MPI_DOUBLE_PRECISION, root=0, comm=0xc4000000) failed<br>
<span></span></div></div></blockquote><div><br></div></span><div>this signals a mismatch between what is sent and what is received in a broadcast operation. This may be due to an obvious bug, that however should show up at the first iteration, not after XX. Apart compiler or MPI library bugs, another reason is the one described in sec.8.3 of the developer manual: different processes following a different execution paths. From time to time, cases like this are found (the latest occurrence, in band parallelization of exact exchange) and easily fixed. Unfortunately, finding them (that is: where this happens) typically requires a painstaking parallel debugging.<span class="m_-7887301464784408242m_-5592783953945793590m_5943795887404165917m_-9086560091869352386HOEnZb"><font color="#888888"><br><br></font></span></div><span class="m_-7887301464784408242m_-5592783953945793590m_5943795887404165917m_-9086560091869352386HOEnZb"><font color="#888888"><div>Paolo<span class="m_-7887301464784408242m_-5592783953945793590m_5943795887404165917HOEnZb"><font color="#888888"><br></font></span></div></font></span></div></div><span class="m_-7887301464784408242m_-5592783953945793590m_5943795887404165917HOEnZb"><font color="#888888"><span class="m_-7887301464784408242m_-5592783953945793590m_5943795887404165917m_-9086560091869352386HOEnZb"><font color="#888888"><div class="gmail_extra">-- <br><div class="m_-7887301464784408242m_-5592783953945793590m_5943795887404165917m_-9086560091869352386m_-7569845848015661181gmail_signature"><div dir="ltr"><div><div dir="ltr"><span><span><font color="#888888">Paolo Giannozzi, Dept. Chemistry&Physics&Environment,<br>
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>
Phone <a href="tel:%2B39-0432-558216" value="+390432558216" target="_blank">+39-0432-558216</a>, fax <a href="tel:%2B39-0432-558222" value="+390432558222" target="_blank">+39-0432-558222</a></font></span></span></div></div></div></div>
</div></font></span></font></span></div><span class="m_-7887301464784408242m_-5592783953945793590m_5943795887404165917HOEnZb"><font color="#888888">
<br>______________________________<wbr>_________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org" target="_blank">Pw_forum@pwscf.org</a><br>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/listi<wbr>nfo/pw_forum</a><br></font></span></blockquote></div><br></div></div>
<br>______________________________<wbr>_________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org" target="_blank">Pw_forum@pwscf.org</a><br>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/listi<wbr>nfo/pw_forum</a><br></blockquote></div><br><br clear="all"><span class="HOEnZb"><font color="#888888"><br></font></span></div></div><span class="HOEnZb"><font color="#888888"><span class="m_-7887301464784408242m_-5592783953945793590HOEnZb"><font color="#888888">-- <br></font></span><div class="m_-7887301464784408242m_-5592783953945793590m_5943795887404165917gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><span class="m_-7887301464784408242m_-5592783953945793590HOEnZb"><font color="#888888">Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,</font></span><span><br>Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>Phone <a href="tel:%2B39-0432-558216" value="+390432558216" target="_blank">+39-0432-558216</a>, fax <a href="tel:%2B39-0432-558222" value="+390432558222" target="_blank">+39-0432-558222</a><br><br></span></div></div></div></div></div>
</font></span></div><span class="HOEnZb"><font color="#888888">
<br>______________________________<wbr>_________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org" target="_blank">Pw_forum@pwscf.org</a><br>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/listi<wbr>nfo/pw_forum</a><br></font></span></blockquote></div><span class="HOEnZb"><font color="#888888"><br></font></span></div>
</div></div></blockquote></div><br></div>
<br>______________________________<wbr>_________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/<wbr>listinfo/pw_forum</a><br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div>
</div>