<div dir="ltr"><div>A hard crash after 7 days seems to me the proper punishment for launching a NEB calculation with an initial barrier of 5600 eV. By the way: are you sure that a path length > 100 a.u. is what you want? <br><br></div>Paolo<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 23, 2016 at 7:14 PM, Francesco Pelizza <span dir="ltr"><<a href="mailto:francesco.pelizza@strath.ac.uk" target="_blank">francesco.pelizza@strath.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>The interruption occurs at the 7th day, I can't say if exactly at
168 hours or around, but not with a discrepancies of 2-3 days so
far.</p>
<p><br>
</p>
<p>I use mpirun on 8 or 12 threads depending on the machine and I
leave NEB to default internal settings<br>
</p>
Attached input file and first output as it came with the
interruption.<br>
<br>
the command is normally:<br>
""mpirun -np 8 neb.x -inp <a href="http://input_file.in" target="_blank">input_file.in</a> >output_file.out""<br>
<br>
So potentially it could be just and hardware fail? sounds
reasonable, the only strange thing is that I have the crash in a
week and not casually, but I appreciate the information you gave me<br>
<br>
Thank you<div><div class="h5"><br>
<br>
<br>
<br>
<div>On 23/09/16 17:57, Paolo Giannozzi
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">On Fri, Sep 23, 2016 at 5:48 PM, Francesco Pelizza
<span dir="ltr"><<a href="mailto:francesco.pelizza@strath.ac.uk" target="_blank">francesco.pelizza@strath.ac.u<wbr>k</a>></span>
wrote:<br>
<div class="gmail_extra">
<div class="gmail_quote">
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">After 7 days, neb.x
stops to work, no closing/cleaning comments, <br>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">nothing else, just it
stops to work.<br>
</blockquote>
<div><br>
</div>
<div>reproducibly, for all jobs, after exactly 7 days and
never 9 or 4? under which conditions? serial, parallel,
with image parallelization, ... ?<br>
<br>
</div>
<div>A parallel code may hang if any two processes that
should go in parallel for some reason don't. This may be
caused by subtle buildups of numerical differences,
coupled to replicated checks; or by a process dying for
whatever reason.<br>
<br>
</div>
<div>Paolo<br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I happens with several systems, is not a problem of
convergence of a<br>
determined image or anything else, the output file is just<br>
interrupted/truncated. <br>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I can anyway easily restart the job, its not the issue of
loosing CPU<br>
time, but when you queue for HPC it becomes a problem.<br>
<br>
<br>
Anyway, Somebody encountered that problem? Is a known
problem?<br>
<br>
Some feedback?<br>
<br>
<br>
Thank you<br>
<br>
<br>
Francesco Pelizza<br>
<br>
______________________________<wbr>_________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org" target="_blank">Pw_forum@pwscf.org</a><br>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/listi<wbr>nfo/pw_forum</a><br>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>Paolo Giannozzi, Dip. Scienze Matematiche
Informatiche e Fisiche,<br>
Univ. Udine, via delle Scienze 208, 33100 Udine,
Italy<br>
Phone <a href="tel:%2B39-0432-558216" value="+390432558216" target="_blank">+39-0432-558216</a>, fax <a href="tel:%2B39-0432-558222" value="+390432558222" target="_blank">+39-0432-558222</a><br>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>______________________________<wbr>_________________
Pw_forum mailing list
<a href="mailto:Pw_forum@pwscf.org" target="_blank">Pw_forum@pwscf.org</a>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" target="_blank">http://pwscf.org/mailman/<wbr>listinfo/pw_forum</a></pre>
</blockquote>
<br>
</div></div></div>
<br>______________________________<wbr>_________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/<wbr>listinfo/pw_forum</a><br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div>
</div>