[Pw_forum] Restart pw.x in a different machine

Malicious Scientist scientist.malicious at gmail.com
Wed Mar 9 18:45:30 CET 2016


Dear Joshua,

Thanks for your tip. Unfortunately, restarting with a different number of
processes is not working for me. I want to migrate the execution between
different machines, but at first I'm trying to stop the computation and
restart it at the same machine, just with a different number of processes.
If I succeed in this scenario, I'll move to migration. The version I'm
working on is the latest, 5.3.0, compiled on a Centos7 machine with
gfortran 4.8.5 and OpenMPI 1.10.0.

The CONTROL namelist of my input file is like this:

&CONTROL
  prefix      = "migration",
  restart_mode = "from_scratch",
  wf_collect  = .TRUE.,
  outdir      = "./scratch/",
  pseudo_dir  = "./pseudopotentials.d",
/
(you can find the full version here: http://pastebin.com/rxN7KCq3)

I started the execution with the following command:

$ mpirun -np 2 ~/quantum/install/pw.x -inp test_4.in > test_4.out

I left it running for a few minutes. Then, I stopped the calculation with:

$ touch migration.EXIT

At the output file test_4.out, I can see that the execution went up to the
sixth iteration:

iteration #  6     ecut=    25.00 Ry     beta=0.30
     Davidson diagonalization with overlap
     ethr =  2.90E-04,  avg # of iterations =  1.0

(full output here: http://pastebin.com/8YhkWTmr)

>From a previous run, I know that there are 28 iterations. After that, I
altered the CONTROL namelist for this:

&CONTROL
  prefix      = "migration",
  restart_mode = "restart",
  wf_collect  = .TRUE.,
  outdir      = "./scratch/",
  pseudo_dir  = "./pseudopotentials.d",
/

I restarted the execution with the following command:

$ mpirun -np 4 ~/quantum/install/pw.x -inp test_4.in > test_4_migration.out

As you can see, instead of 2 processes, I'm setting 4 in the second run.
Using the linux tool 'top', I can see that four processes were created. The
program seems to find the right iteration, since the output file
test_4_migration.out contains the following:

Starting wfc from file
     Calculation restarted from scf iteration #     7
     total cpu time spent up to now is        3.3 secs
     per-process dynamical memory:    44.9 Mb
(full output here: http://pastebin.com/GfBBqxYJ)

But even after several minutes, no new iterations are appended to the file.
And no error messages either. Am I missing something?

Cheers,

---------------------------------------------------
Name: Joaquim José Xavier
Institution: Faculdade de Educação, Ciências, e Letras do Sertão Central -
Quixadá - Ceará - Brasil
http://www.uece.br/feclesc/
---------------------------------------------------

On Tue, Mar 8, 2016 at 5:12 PM, Joshua Davis <davis101 at chemistry.msu.edu>
wrote:

> Dear Joaquim,
>
> you may want to look up the "wfcollect" option under &CONTROL
>
>
> http://www.quantum-espresso.org/wp-content/uploads/Doc/INPUT_PW.html#__top__
>
> Joshua Davis
> Michigan State University
>
> On Tue, Mar 8, 2016 at 2:11 PM Malicious Scientist <
> scientist.malicious at gmail.com> wrote:
>
>> Dear Nicola,
>>
>> Sorry, my mistake.
>>
>> ---------------------------------------------------
>> Name: Joaquim José Xavier
>> Institution: Faculdade de Educação, Ciências, e Letras do Sertão Central
>> - Quixadá - Ceará - Brasil
>> http://www.uece.br/feclesc/
>> ---------------------------------------------------
>>
>> On Tue, Mar 8, 2016 at 3:42 PM, Nicola Marzari <nicola.marzari at epfl.ch>
>> wrote:
>>
>>>
>>> Dear Malicious,
>>>
>>> PLEASE see the posting guidelines:
>>> http://www.quantum-espresso.org/forum/#1.0
>>>
>>> *Sign your post with your name and affiliation.*
>>>
>>> nicola
>>>
>>>
>>>
>>> On 08/03/2016 19:37, Malicious Scientist wrote:
>>> > Hello Community,
>>> >
>>> > I would like to know if it is possible top stop a pw.x run, copy to
>>> > files to a different machine, and then restart the computation.
>>> >
>>> > For example, to stop the execution, I would create a $prefix.EXIT file
>>> > on the working directory (just like described at
>>> >
>>> http://www.quantum-espresso.org/wp-content/uploads/Doc/pw_user_guide/node19.html
>>> ).
>>> >
>>> > After that, I would copy the entire working directory, including the
>>> > scratch dir, to a remote server with the same version of QE installed.
>>> > Then I would restart the computation setting the 'restart_mode' flag to
>>> > 'restart' at the CONTROL namelist.
>>> >
>>> > Is this supposed to work? If so, may I restart the computation with a
>>> > different number of CPUs?
>>> >
>>> > Thank you for your attention.
>>> >
>>> >
>>> > _______________________________________________
>>> > Pw_forum mailing list
>>> > Pw_forum at pwscf.org
>>> > http://pwscf.org/mailman/listinfo/pw_forum
>>> >
>>>
>>> --
>>> ----------------------------------------------------------------------
>>> Prof Nicola Marzari, Chair of Theory and Simulation of Materials, EPFL
>>> Director, National Centre for Competence in Research NCCR MARVEL, EPFL
>>> http://theossrv1.epfl.ch/Main/Contact http://nccr-marvel.ch/en/project
>>> _______________________________________________
>>> Pw_forum mailing list
>>> Pw_forum at pwscf.org
>>> http://pwscf.org/mailman/listinfo/pw_forum
>>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160309/0c6556ed/attachment.html>


More information about the users mailing list