[Pw_forum] ph.x: Avoiding the recalculation of the band structure in distributed phonon dispersion jobs

Andrea Dal Corso dalcorso at sissa.it
Wed Feb 27 14:29:31 CET 2013


On Wed, 2013-02-27 at 11:10 +0000, Karttunen Antti wrote:
> Dear Andrea,
> 
> I'm glad to hear that resetting current_q is no problem. While running some tests today, I realized that there is one more point I did not think about in the new only_init approach. We run the individual (q,irr) grid jobs in serial mode since this is simplest to achieve in a grid. However, it would be really helpful to execute the only_init run locally in parallel since in serial mode the epsilon + band structure calculation of the largest systems can take several days. So, I tried to run only_init in parallel and (q,irr) jobs in serial, but then ph.x will fail in q>1 since the only_init run writes separate wfc files for all parallel processes and openfilq is looking for just one wfc file. My naive first attempt was to modify run_pwscf:
>   twfcollect=.FALSE.
>   IF (only_init) twfcollect=.TRUE.
>   CALL punch( 'all' )
> but I realized that this just writes the data in the _ph0/qdir/prefix.save in the wf_collect-format and does not produce the _ph0/qdir/prefix.wfc file that openfilq is waiting for. 
> 
> I wonder if there would be any simple way to 
> a) make run_pwscf to write the _ph0/qdir/prefix.wfc file for a parallel only_init job
> or b) make openfilq (and phq_init) to read wf_collect-style wavefunction data for q>1  if there is no _ph0/qdir/prefix.wfc file (or if a keyword tells it so)?
> 
> Or would this just complicate things too much? I guess the latter option could be considered as an "internal" wf_collect option for ph.x, resulting in maximum flexibility.
> 

I am not going to implement this in the SVN version, at least not now.
However it seems that if you reopen the wavefunctions after saving them
with twfcollect=.true.. with something like:

     CALL punch( 'all' )
     IF (only_init) THEN
        CALL clean_pw( .TRUE. )
        CALL close_files(.true.)
        wfc_dir=tmp_dir_phq
        tmp_dir=tmp_dir_phq
        CALL read_file()
        IF (.NOT.lgamma_iq(iq).OR.(qplot.AND.iq>1)) CALL
set_small_group_of_q(nsymq,invsymq,minus_q)
     ENDIF

you can both run the epsilon calculation and the next ph.x runs with a
different number of processors. It is really inelegant, and I think
there are better ways to do this, but it seems to work.

Best wishes,

Andrea



> Best wishes,
> Antti
> 
> -- 
> Dr. Antti Karttunen
> Department of Chemistry
> University of Jyväskylä, Finland
> Tel: +358-50-3473475
> WWW: http://www.iki.fi/ankarttu 
> 
> 
> -----Original Message-----
> From: pw_forum-bounces at pwscf.org [mailto:pw_forum-bounces at pwscf.org] On Behalf Of Andrea Dal Corso
> Sent: Wednesday, February 27, 2013 12:13 PM
> To: PWSCF Forum
> Subject: Re: [Pw_forum] ph.x: Avoiding the recalculation of the band structure in distributed phonon dispersion jobs
> 
> 
> On Wed, 2013-02-27 at 06:55 +0000, Karttunen Antti wrote:
> > Dear Andrea,
> > 
> > Thank you very much for the bug fix and introducing the low_directory_check input variable. Now the process goes very smoothly and we can avoid all the unnecessary band calculations in the future.
> > 
> > I noticed that there is still some problem with the GRID_example run_example_3: Looking at the reference output files, epsilon and bands are actually recalculated at every q. I ran the example and it seems that there is some problem with the management of the temporary directories. The example actually runs nicely, if one completely omits the creation of the separate $q.$irr directories and just runs with one single _ph0 directory with one $prefix.phsave and all the qdirs. 
> > 
> > I also noticed that the run_example_3 always tries to keep the qdir of the last q-point in the current temp directory:
> >   cp -r $TMP_DIR/_ph0/$PREFIX.q_8 $TMP_DIR/$q.$irr/_ph0/
> > I guess the reason for this is that without this, ph.x crashes for q<8 because seqopn fails for $prefix.q_8/recover? I encountered this with my own tests, too. It seems that after the only_init run, CURRENT_Q in status_run.xml is set to the last q-point and ph.x would then like to have $prefix.q_8 directory around in the following (q,irr) calculations. I'm planning that I don't want to move all qdirs into every (q,irr) _ph0 directory, so after the only_init run, I will reset the CURRENT_Q to 1 in my scripts. For example something like
> > 
> > sed -r -i '/<CURRENT_Q/,/<\/CURRENT_Q/s/[[:digit:]]+[[:space:]]*$/1/' _ph0/$prefix.phsave/status_run.xml
> > 
> > works nicely. Or maybe ph.x could reset CURRENT_Q to 1 in the end of a successful only_init-run? But this might have some side effects I'm not aware of, so I'm also fine with using the above script. Anyway, thanks a lot for all the great work with the grid implementation, this will enormously speed up our work on the phonon calculations of large systems.
> > 
> The script had still some problems, now it should be OK. OK also for the
> reset of current_q, I have now commited the change.
> 
> The reason for having different directories $q.$irr is that the GRID
> example should work also in different machines that do not share the
> same disk, but it is not necessary to use them when you work with many
> CPUs that share the same disk.
> 
> Andrea
> 
> 
> 
> 
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
-- 
Andrea Dal Corso                    Tel. 0039-040-3787428
SISSA, Via Bonomea 265              Fax. 0039-040-3787249
I-34136 Trieste (Italy)             e-mail: dalcorso at sissa.it





More information about the users mailing list