[Pw_forum] Can I get away with restart_mode="restart" in this case? max_seconds did not end the calculation right.

Paolo Giannozzi p.giannozzi at gmail.com
Mon Feb 15 18:37:42 CET 2016


The check that "max_seconds" have elapsed is done at the end of each single
diagonalization, so if the latter takes "many_seconds", the check may be
triggered in the worst case when "max_seconds + some_seconds" have elapsed.
Since it may take "some_more_seconds" to write data to disk, if you are out
of luck, "max_seconds + some_seconds + some_more_seconds" will exceed the
maximum allowed time by the batch queue (or, more exactly, the time after
which the batch queue realizes that you are out of time: in your run,
86427s, or 27s more than the wall time limit, 86400).

Unfortunately there is no way you can recover your data. And no, there is
no reliable way to ask the operating system "how much time do I have"
before starting a new diagonalization ...

Paolo

On Mon, Feb 15, 2016 at 6:11 PM, Joshua Davis <davis101 at chemistry.msu.edu>
wrote:

> Continued... (Sent before I meant to)
>
> I did try to use disk_io = "high", but I ran into "davcio (10)" read and
> write errors so I just used the default "low" option.  There were wfc file
> written in my outdir too.
>
> Below contains much of the control options I used:
>
> &CONTROL
>    title = 'MgB5C2PP_NORMCON_HSE_ec140_5kp_115bnd_1Q',
>    calculation = 'scf',
>    pseudo_dir = './pot',
>    outdir = './scratch',
>    prefix = 'MgB5CPP_NC_PBE_ec140_5kp_115bnd',
>    etot_conv_thr = 1.0D-5,
>    forc_conv_thr = 1.0D-4,
>    verbosity = 'high',
>    wf_collect = .true.,
>    max_seconds = 84600
>  /
>
>  &SYSTEM
>    ibrav = 0,
>    nat = 52,
>    ntyp = 3,
>    ecutwfc = 140,
>    nspin = 1,
>    occupations = 'fixed',
>    nbnd = 115,
>    input_dft = 'hse',
>    screening_parameter = 0.106,
>    nqx1 = 1, nqx2 = 1, nqx3 = 1
>  /
>
>  &ELECTRONS
>    mixing_beta = 0.7,
>    conv_thr = 1.D-8,
>    electron_maxstep = 200
>  /
>
>
> ATOMIC_SPECIES
>  Mg 24.305 Mg.pbe-hgh.UPF
>  B 10.81 B.pbe-hgh.UPF
>  C 12.011 C.pbe-hgh.UPF
>
>
> K_POINTS (automatic)
>  5 5 5  0 0 0
>
> The calculation ended with:
>
>     100 total processes killed (some possibly by mpirun during cleanup)
>
> in the out file, and the following was in the scheduler output file:
>
>     mpirun: killing job...
>
>
> --------------------------------------------------------------------------
>     mpirun noticed that process rank 0 with PID 26679 on node scw-003
> exited on signal 0 (Unknown signal 0).
>
> --------------------------------------------------------------------------
>     =>> PBS: job killed: walltime 86427 exceeded limit 86400
>     mpirun: abort is already in progress...hit ctrl-c again to forcibly
> terminate
>
>
>
> Other info:   The system runs CentOS 6.6, and I am running QE5.3 compiled
> with ifort 13.01
>
> Any help would be much appreciated.
>
> ----------------------------------------------------------------------------------------------------------------
> Joshua D. Davis
>
> Graduate Assistant
> Department of Chemistry
> Michigan State University
>
> -----------------------------------------------------------------------------------------------------------------
>
> On Mon, Feb 15, 2016 at 11:55 AM, Joshua Davis <davis101 at chemistry.msu.edu
> > wrote:
>
>> Dear pwscf fourm,
>>
>> I am currently trying to run an HSE calculation on my university's high
>> performance cluster.  To make sure the density and wave-functions are
>> written properly before scheduled session ends I usually use max_seconds to
>> stop the calculation.  The max_seconds function did stop the calculation
>> and was ended by the scheduler.  Can I still  use the wave-function files
>> even though the calculation did not end right?
>>
>> The default disk_io is set to the default "low".  I did try to use
>> disk_io = "high", but I ran into "davcio (10)"
>>
>>
>> ----------------------------------------------------------------------------------------------------------------
>> Joshua D. Davis
>>
>> Graduate Assistant
>> Michigan State University
>>
>>
>> -----------------------------------------------------------------------------------------------------------------
>>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160215/cb48ee3e/attachment.html>


More information about the users mailing list