[Pw_forum] Can I get away with restart_mode="restart" in this case? max_seconds did not end the calculation right.
Joshua Davis
davis101 at chemistry.msu.edu
Mon Feb 15 19:46:51 CET 2016
Thank you for your Help.
----------------------------------------------------------------------------------------------------------------
Joshua D. Davis
davis.d.josh at gmail.com
Cell: (734)707-1790
Graduate Assistant
Department of Chemistry
Michigan State University
578 S. Shaw Lane, room 432
East Lansing, MI 48824
-----------------------------------------------------------------------------------------------------------------
On Mon, Feb 15, 2016 at 12:37 PM, Paolo Giannozzi <p.giannozzi at gmail.com>
wrote:
> The check that "max_seconds" have elapsed is done at the end of each
> single diagonalization, so if the latter takes "many_seconds", the check
> may be triggered in the worst case when "max_seconds + some_seconds" have
> elapsed. Since it may take "some_more_seconds" to write data to disk, if
> you are out of luck, "max_seconds + some_seconds + some_more_seconds" will
> exceed the maximum allowed time by the batch queue (or, more exactly, the
> time after which the batch queue realizes that you are out of time: in your
> run, 86427s, or 27s more than the wall time limit, 86400).
>
> Unfortunately there is no way you can recover your data. And no, there is
> no reliable way to ask the operating system "how much time do I have"
> before starting a new diagonalization ...
>
> Paolo
>
> On Mon, Feb 15, 2016 at 6:11 PM, Joshua Davis <davis101 at chemistry.msu.edu>
> wrote:
>
>> Continued... (Sent before I meant to)
>>
>> I did try to use disk_io = "high", but I ran into "davcio (10)" read and
>> write errors so I just used the default "low" option. There were wfc file
>> written in my outdir too.
>>
>> Below contains much of the control options I used:
>>
>> &CONTROL
>> title = 'MgB5C2PP_NORMCON_HSE_ec140_5kp_115bnd_1Q',
>> calculation = 'scf',
>> pseudo_dir = './pot',
>> outdir = './scratch',
>> prefix = 'MgB5CPP_NC_PBE_ec140_5kp_115bnd',
>> etot_conv_thr = 1.0D-5,
>> forc_conv_thr = 1.0D-4,
>> verbosity = 'high',
>> wf_collect = .true.,
>> max_seconds = 84600
>> /
>>
>> &SYSTEM
>> ibrav = 0,
>> nat = 52,
>> ntyp = 3,
>> ecutwfc = 140,
>> nspin = 1,
>> occupations = 'fixed',
>> nbnd = 115,
>> input_dft = 'hse',
>> screening_parameter = 0.106,
>> nqx1 = 1, nqx2 = 1, nqx3 = 1
>> /
>>
>> &ELECTRONS
>> mixing_beta = 0.7,
>> conv_thr = 1.D-8,
>> electron_maxstep = 200
>> /
>>
>>
>> ATOMIC_SPECIES
>> Mg 24.305 Mg.pbe-hgh.UPF
>> B 10.81 B.pbe-hgh.UPF
>> C 12.011 C.pbe-hgh.UPF
>>
>>
>> K_POINTS (automatic)
>> 5 5 5 0 0 0
>>
>> The calculation ended with:
>>
>> 100 total processes killed (some possibly by mpirun during cleanup)
>>
>> in the out file, and the following was in the scheduler output file:
>>
>> mpirun: killing job...
>>
>>
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 26679 on node scw-003
>> exited on signal 0 (Unknown signal 0).
>>
>> --------------------------------------------------------------------------
>> =>> PBS: job killed: walltime 86427 exceeded limit 86400
>> mpirun: abort is already in progress...hit ctrl-c again to forcibly
>> terminate
>>
>>
>>
>> Other info: The system runs CentOS 6.6, and I am running QE5.3 compiled
>> with ifort 13.01
>>
>> Any help would be much appreciated.
>>
>> ----------------------------------------------------------------------------------------------------------------
>> Joshua D. Davis
>>
>> Graduate Assistant
>> Department of Chemistry
>> Michigan State University
>>
>> -----------------------------------------------------------------------------------------------------------------
>>
>> On Mon, Feb 15, 2016 at 11:55 AM, Joshua Davis <
>> davis101 at chemistry.msu.edu> wrote:
>>
>>> Dear pwscf fourm,
>>>
>>> I am currently trying to run an HSE calculation on my university's high
>>> performance cluster. To make sure the density and wave-functions are
>>> written properly before scheduled session ends I usually use max_seconds to
>>> stop the calculation. The max_seconds function did stop the calculation
>>> and was ended by the scheduler. Can I still use the wave-function files
>>> even though the calculation did not end right?
>>>
>>> The default disk_io is set to the default "low". I did try to use
>>> disk_io = "high", but I ran into "davcio (10)"
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------------------
>>> Joshua D. Davis
>>>
>>> Graduate Assistant
>>> Michigan State University
>>>
>>>
>>> -----------------------------------------------------------------------------------------------------------------
>>>
>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160215/0de60248/attachment.html>
More information about the users
mailing list