[Pw_forum] ph.x: Avoiding the recalculation of the band structure in distributed phonon dispersion jobs

Karttunen Antti antti.j.karttunen at jyu.fi
Wed Feb 27 12:10:17 CET 2013


Dear Andrea,

I'm glad to hear that resetting current_q is no problem. While running some tests today, I realized that there is one more point I did not think about in the new only_init approach. We run the individual (q,irr) grid jobs in serial mode since this is simplest to achieve in a grid. However, it would be really helpful to execute the only_init run locally in parallel since in serial mode the epsilon + band structure calculation of the largest systems can take several days. So, I tried to run only_init in parallel and (q,irr) jobs in serial, but then ph.x will fail in q>1 since the only_init run writes separate wfc files for all parallel processes and openfilq is looking for just one wfc file. My naive first attempt was to modify run_pwscf:
  twfcollect=.FALSE.
  IF (only_init) twfcollect=.TRUE.
  CALL punch( 'all' )
but I realized that this just writes the data in the _ph0/qdir/prefix.save in the wf_collect-format and does not produce the _ph0/qdir/prefix.wfc file that openfilq is waiting for. 

I wonder if there would be any simple way to 
a) make run_pwscf to write the _ph0/qdir/prefix.wfc file for a parallel only_init job
or b) make openfilq (and phq_init) to read wf_collect-style wavefunction data for q>1  if there is no _ph0/qdir/prefix.wfc file (or if a keyword tells it so)?

Or would this just complicate things too much? I guess the latter option could be considered as an "internal" wf_collect option for ph.x, resulting in maximum flexibility.

Best wishes,
Antti

-- 
Dr. Antti Karttunen
Department of Chemistry
University of Jyväskylä, Finland
Tel: +358-50-3473475
WWW: http://www.iki.fi/ankarttu 


-----Original Message-----
From: pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] On Behalf Of Andrea Dal Corso
Sent: Wednesday, February 27, 2013 12:13 PM
To: PWSCF Forum
Subject: Re: [Pw_forum] ph.x: Avoiding the recalculation of the band structure in distributed phonon dispersion jobs


On Wed, 2013-02-27 at 06:55 +0000, Karttunen Antti wrote:
> Dear Andrea,
> 
> Thank you very much for the bug fix and introducing the low_directory_check input variable. Now the process goes very smoothly and we can avoid all the unnecessary band calculations in the future.
> 
> I noticed that there is still some problem with the GRID_example run_example_3: Looking at the reference output files, epsilon and bands are actually recalculated at every q. I ran the example and it seems that there is some problem with the management of the temporary directories. The example actually runs nicely, if one completely omits the creation of the separate $q.$irr directories and just runs with one single _ph0 directory with one $prefix.phsave and all the qdirs. 
> 
> I also noticed that the run_example_3 always tries to keep the qdir of the last q-point in the current temp directory:
>   cp -r $TMP_DIR/_ph0/$PREFIX.q_8 $TMP_DIR/$q.$irr/_ph0/
> I guess the reason for this is that without this, ph.x crashes for q<8 because seqopn fails for $prefix.q_8/recover? I encountered this with my own tests, too. It seems that after the only_init run, CURRENT_Q in status_run.xml is set to the last q-point and ph.x would then like to have $prefix.q_8 directory around in the following (q,irr) calculations. I'm planning that I don't want to move all qdirs into every (q,irr) _ph0 directory, so after the only_init run, I will reset the CURRENT_Q to 1 in my scripts. For example something like
> 
> sed -r -i '/<CURRENT_Q/,/<\/CURRENT_Q/s/[[:digit:]]+[[:space:]]*$/1/' _ph0/$prefix.phsave/status_run.xml
> 
> works nicely. Or maybe ph.x could reset CURRENT_Q to 1 in the end of a successful only_init-run? But this might have some side effects I'm not aware of, so I'm also fine with using the above script. Anyway, thanks a lot for all the great work with the grid implementation, this will enormously speed up our work on the phonon calculations of large systems.
> 
The script had still some problems, now it should be OK. OK also for the
reset of current_q, I have now commited the change.

The reason for having different directories $q.$irr is that the GRID
example should work also in different machines that do not share the
same disk, but it is not necessary to use them when you work with many
CPUs that share the same disk.

Andrea







More information about the users mailing list