[Pw_forum] premature termination of neb.x

Paolo Giannozzi p.giannozzi at gmail.com
Fri Sep 23 18:57:09 CEST 2016


On Fri, Sep 23, 2016 at 5:48 PM, Francesco Pelizza <
francesco.pelizza at strath.ac.uk> wrote:


> After 7 days, neb.x stops to work, no closing/cleaning comments,
>
nothing else, just it stops to work.
>

reproducibly, for all jobs, after exactly 7 days and never 9 or 4? under
which conditions? serial, parallel, with image parallelization, ... ?

A parallel code may hang if any two processes that should go in parallel
for some reason don't. This may be caused by subtle buildups of numerical
differences, coupled to replicated checks; or by a process dying for
whatever reason.

Paolo


>
> I happens with several systems, is not a problem of convergence of a
> determined image or anything else, the output file is just
> interrupted/truncated.
>

> I can anyway easily restart the job, its not the issue of loosing CPU
> time, but when you queue for HPC it becomes a problem.
>
>
> Anyway, Somebody encountered that problem? Is a known problem?
>
> Some feedback?
>
>
> Thank you
>
>
> Francesco Pelizza
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160923/d5a7ee17/attachment.html>


More information about the users mailing list