[Pw_forum] premature termination of neb.x

Paolo Giannozzi p.giannozzi at gmail.com
Sat Sep 24 21:29:14 CEST 2016


A hard crash after 7 days seems to me the proper punishment for launching a
NEB calculation with an initial barrier of 5600 eV. By the way: are you
sure that a path length > 100 a.u. is what you want?

Paolo

On Fri, Sep 23, 2016 at 7:14 PM, Francesco Pelizza <
francesco.pelizza at strath.ac.uk> wrote:

> The interruption occurs at the 7th day, I can't say if exactly at 168
> hours or around, but not with a discrepancies of 2-3 days so far.
>
>
> I use mpirun on 8 or 12 threads depending on the machine and I leave NEB
> to default internal settings
> Attached input file and first output as it came with the interruption.
>
> the command is normally:
> ""mpirun -np 8 neb.x -inp input_file.in >output_file.out""
>
> So potentially it could be just and hardware fail? sounds reasonable, the
> only strange thing is that I have the crash in a week and not casually, but
> I appreciate the information you gave me
>
> Thank you
>
>
>
>
> On 23/09/16 17:57, Paolo Giannozzi wrote:
>
> On Fri, Sep 23, 2016 at 5:48 PM, Francesco Pelizza <
> francesco.pelizza at strath.ac.uk> wrote:
>
>
>> After 7 days, neb.x stops to work, no closing/cleaning comments,
>>
> nothing else, just it stops to work.
>>
>
> reproducibly, for all jobs, after exactly 7 days and never 9 or 4? under
> which conditions? serial, parallel, with image parallelization, ... ?
>
> A parallel code may hang if any two processes that should go in parallel
> for some reason don't. This may be caused by subtle buildups of numerical
> differences, coupled to replicated checks; or by a process dying for
> whatever reason.
>
> Paolo
>
>
>>
>> I happens with several systems, is not a problem of convergence of a
>> determined image or anything else, the output file is just
>> interrupted/truncated.
>>
>
>> I can anyway easily restart the job, its not the issue of loosing CPU
>> time, but when you queue for HPC it becomes a problem.
>>
>>
>> Anyway, Somebody encountered that problem? Is a known problem?
>>
>> Some feedback?
>>
>>
>> Thank you
>>
>>
>> Francesco Pelizza
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
>
> _______________________________________________
> Pw_forum mailing listPw_forum at pwscf.orghttp://pwscf.org/mailman/listinfo/pw_forum
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160924/d9fc50d0/attachment.html>


More information about the users mailing list