[QE-users] Question about restarting relaxation jobs
Pietro Davide Delugas
pdelugas at sissa.it
Mon Jul 8 10:14:06 CEST 2019
Hello
1) and 2) PW writes the restart files only when it terminates before
convergence is reached either because the max number of steps (and the
max number may be either the number of electronic steps during scf of
number of ionic steps during structural relaxation) or the the execution
time exceed max_seconds specified in input or because the user has
stopped the calculation creating a file in the outdir called prefix.EXIT.
If restart_mode in &control is set to "restart" pw will try to restart
the relaxation from the last POSITIONS which have been saved in the
prefix.save directory using the last saved charge density and wave
functions. If it finds the restart files it will use them also. This
mechanism works fine if positions, charge density and wave functions
data have been saved regularly, but if the calculation is going to be
stopped abruptly, for example by the job manager, there is no way to
prevent that the stop arrives when the program is writing these data.
The safer way to go when you are using a job manager is to set the
max_seconds variable to a number consistently lower than the time
allocated by the job manager, the difference between these two times
should be enough to allow to the program to pass through one of the
check_points at which, during execution, it checks if the execution time
has exceeded the max_seconds s or if the user has created a prefix.EXIT
file. To estimate how long should be the difference between max_seconds
and the scheduled execution time check how long it takes to the program
to make an scf loop, this one will a very safe estimate, you could
reduce this time significantly and things should be working.
3) I don't understans what you want to do. You create the prefix.EXIT
file when you want to stop your calculation and you want the calculation
to finish smoothly saving all restart information so that it can
resatart from more or less the same point when it was interrupted. It is
completely senseless to rename the output file as prefix.EXIT because
it will make the program to stop as soon as a check_point detects the
file and the file will be deleted. The only thing that you have to do
when restarting a calculation is
* Specify restart_mode = 'restart' in the input.in file
* take care that the information saved in output.out is not
rewritten by the new execution just use something as mpirun pw.x
< input.in >> output.out which appends the new output to the old one
or redirect the output to files with different names
4) outdir must be the same or if you want to use a different one you
have to create the new outdir befor restarting and copy there all the
data of the previous calculation i.e. the prefix.save directory.
5) don't complicate things too much
Pietro
On 7/6/19 3:59 PM, Yeon, Jejoon wrote:
>
> Hello
>
>
> I have very small amount of experience using QE, so please excuse my
> beginner question. I'm about to start relaxation of big crystal
> structure, and I wish to make my QE relaxation jobs ready for restart.
> Here are my questions:
>
>
> 1) According to "restarting" section from manual,
> (https://www.quantum-espresso.org/Doc/pw_user_guide/node20.html) it
> seems that QE does not creates the dedicated restart file, is this
> correct?
>
>
> 2) If I set up "max_seconds" option as 604800 seconds (1 week), and
> request wall time to server 1 week, do my calculation jobs are ready
> to restart after 1 week? (1 week is just example but our server
> cluster have maximum some walltime limitation, and I don't think any
> of my relaxation works will be finished within that time. ) Also, does
> this "max_seconds" option must be required to restart?
>
> 3) When I execute QE in the submit script, I use something similar as:
> mpirun pw.x < input.in > output.out
> In this case, if the relaxation job is killed due to wall time limit
> (without setting max_seconds), can I just change the name of the
> output.out to prefix.EXIT, (of course I set up prefix in the input
> file) and then include restart_mode = "restart" in the input file,
> then submit a job for restart?
> I have old files which are finished after reaching wall time limit
> without "max_seconds" option, and I'm curious if I can use those files
> to restart.
>
> 4) I also use outdir option in the input file, does the outdir option
> should be the same when restart?
>
> 5) Are there any other things or useful hints that I need to consider
> when restart?
>
> Thank you
>
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190708/fca3c8f5/attachment.html>
More information about the users
mailing list