[Q-e-developers] bug image parallelization in PHonon

Paolo Giannozzi p.giannozzi at gmail.com
Fri Sep 23 18:21:23 CEST 2016


Is the following patch doing (that is: stopping) the job?
---
Index: /home/giannozz/trunk/espresso/PHonon/PH/phq_readin.f90
===================================================================
--- /home/giannozz/trunk/espresso/PHonon/PH/phq_readin.f90    (revision
13008)
+++ /home/giannozz/trunk/espresso/PHonon/PH/phq_readin.f90    (working copy)
@@ -679,6 +679,8 @@

   IF(elph.and.nimage>1) call errore('phq_readin',&
        'el-ph with images not implemented',1)
+  IF( fildvscf /= ' ' .and. nimage > 1 ) call errore('phq_readin',&
+       'saving dvscf to file images not implemented',1)

   IF (elph.OR.fildvscf /= ' ') lqdir=.TRUE.
---

Paolo

On Fri, Sep 23, 2016 at 4:45 PM, Thomas Brumme <thomas.brumme at mpsd.mpg.de>
wrote:

> I meanwhile had a discussion with Lorenzo Paulatto about a similar problem.
>
> I think that it might be a rather specific problem. As soon as I
> parallelize only over
> q points using start_q and last_q there is no problem - also for
> restarting.
>
> Using images I can, in principle, even create the full dvscf files,
> without having to
> rerun the calculation without images, using split and cat on the different
> dvscf files
> in the different temp folders. It's tedious but it works. Yet, in future I
> will use only
> the parallelization over q points for the calculation of the dvscf.
>
> In summary, the parallelization for PH is not straightforward and I think
> that it
> might help to store, e.g., the dvscf files for different representations
> separately.
> But Lorenzo mentioned that system administrators complain if the number of
> written files is large... It could be helpful if there would be a kind of
> summary
> what can be done using images and what not... I.e. dvscf (and el-ph) does
> not
> work if image parallelization is used, especially if the different
> representations
> of one q point are split across different images. For el-ph the code does
> not
> start, but maybe a similar check can be added for the dvscf files?
>
> Well, or maybe not, I don't know :)
>
> On 09/23/2016 04:24 PM, Paolo Giannozzi wrote:
>
> has anybody any idea? P.
>
> On Wed, Sep 14, 2016 at 1:30 PM, Thomas Brumme <thomas.brumme at mpsd.mpg.de>
> wrote:
>
>> Dear all,
>>
>> I think I found a bug in the image parallelization of PH - or I'm doing
>> something wrong.
>> I used the version 5.4 but the problem is also there if I use the 6.0
>> beta.
>> Maybe someone remembers my email few days ago to the normal email list
>> concerning
>> the parallelization using the GRID technique - the problem I encounter
>> here is essentially
>> the same. As an example, I use a modified run_example_1 of the
>> Recover_example
>> directory of PH.
>>
>> Description of the problem:
>>
>> 0. (Following the example) I did an scf calculation using 2 CPUs with:
>>
>>   &control
>>      calculation='scf'
>>      restart_mode='from_scratch',
>>      prefix='aluminum',
>>      pseudo_dir = './',
>>      outdir='./tempdir/'
>>   /
>>   &system
>>      ibrav=  2, celldm(1) =7.5, nat= 1, ntyp= 1,
>>      ecutwfc =15.0,
>>      occupations='smearing', smearing='methfessel-paxton', degauss=0.05,
>>      la2F = .true.,
>>   /
>>   &electrons
>>      conv_thr =  1.0d-8
>>      mixing_beta = 0.7
>>   /
>> ATOMIC_SPECIES
>>   Al  26.98 Al.pz-vbc.UPF
>> ATOMIC_POSITIONS
>>   Al 0.00 0.00 0.00
>> K_POINTS {automatic}
>>   16 16 16  0 0 0
>>
>>
>> 1. I'll do the scf calculation using 2 CPUS and:
>>
>>   &control
>>      calculation='scf'
>>      restart_mode='from_scratch',
>>      prefix='aluminum',
>>      pseudo_dir = './',
>>      outdir='./tempdir/'
>>   /
>>   &system
>>      ibrav=  2, celldm(1) =7.5, nat= 1, ntyp= 1,
>>      ecutwfc =15.0,
>>      occupations='smearing', smearing='methfessel-paxton', degauss=0.05
>>   /
>>   &electrons
>>      conv_thr =  1.0d-8
>>      mixing_beta = 0.7
>>   /
>> ATOMIC_SPECIES
>>   Al  26.98 Al.pz-vbc.UPF
>> ATOMIC_POSITIONS
>>   Al 0.00 0.00 0.00
>> K_POINTS {automatic}
>>   8 8 8  0 0 0
>>
>>
>> 2. I'll do a phonon calculation including storing the dvscf files and
>> using images.
>> More specifically I used:
>>
>> mpirun -np 4 ph.x -ni 2 < al.elph.in
>>
>> with al.elph.in given by:
>>
>> Electron-phonon coefficients for Al
>>   &inputph
>>    tr2_ph=1.0d-10,
>>    prefix='aluminum',
>>    fildvscf='aldv',
>>    amass(1)=26.98,
>>    outdir='./tempdir/',
>>    fildyn='al.dyn',
>> !  electron_phonon='interpolated',
>> !  el_ph_sigma=0.005,
>> !  el_ph_nsigma=10,
>> !  recover=.true.
>> !  trans=.false.,
>>    ldisp=.true.
>>    max_seconds=6,
>>    nq1=4, nq2=4, nq3=4
>>   /
>>
>> I used max_seconds in order to simulate the finite run time we have on
>> our HPC.
>> Restarting with recover=.true. works fine... I.e. I used:
>>
>> Electron-phonon coefficients for Al
>>   &inputph
>>    tr2_ph=1.0d-10,
>>    prefix='aluminum',
>>    fildvscf='aldv',
>>    amass(1)=26.98,
>>    outdir='./tempdir/',
>>    fildyn='al.dyn',
>> !  electron_phonon='interpolated',
>> !  el_ph_sigma=0.005,
>> !  el_ph_nsigma=10,
>>    recover=.true.
>> !  trans=.false.,
>>    ldisp=.true.
>>    max_seconds=6,
>>    nq1=4, nq2=4, nq3=4
>>   /
>>
>>
>> 3. Now I want to collect all data using no images:
>>
>> mpirun -np 2 ph.x < al.elph.in
>>
>> with the same input file as given in 2.
>>
>> I'll get the error "Possibly too few bands at point ..." once the code
>> wants to
>> recalculate the wave functions for the q points which were calculated
>> only on
>> the second image, i.e., for q points 6, 7, and 8.
>>
>> If I check the charge_density.dat files in the subfolders of the q
>> points in the
>> _ph0 directory I find that they're empty. Thus, I copied the q
>> subfolders of the
>> second image by hand to the folder of the first image using:
>>
>> cp -r _ph1/aluminum.q_* _ph0/
>>
>> If I now restart without images, using the input of 2. it works...
>> Everything is fine...
>>
>>
>> 4. Now I can also calculate the el-ph parameters using the input:
>>
>> Electron-phonon coefficients for Al
>>   &inputph
>>    tr2_ph=1.0d-10,
>>    prefix='aluminum',
>>    fildvscf='aldv',
>>    amass(1)=26.98,
>>    outdir='./tempdir/',
>>    fildyn='al.dyn',
>>    electron_phonon='interpolated',
>>    el_ph_sigma=0.005,
>>    el_ph_nsigma=10,
>> !  recover=.true.
>>    trans=.false.,
>>    ldisp=.true.
>> !  max_seconds=6,
>>    nq1=4, nq2=4, nq3=4
>>   /
>>
>>
>> 5. Another problem I encounter is the following... Suppose the run time
>> is not enough to
>> finish the el-ph calculations, i.e., instead of the input in 4. I use:
>>
>> Electron-phonon coefficients for Al
>>   &inputph
>>    tr2_ph=1.0d-10,
>>    prefix='aluminum',
>>    fildvscf='aldv',
>>    amass(1)=26.98,
>>    outdir='./tempdir/',
>>    fildyn='al.dyn',
>>    electron_phonon='interpolated',
>>    el_ph_sigma=0.005,
>>    el_ph_nsigma=10,
>> !  recover=.true.
>>    trans=.false.,
>>    ldisp=.true.
>>    max_seconds=6,
>>    nq1=4, nq2=4, nq3=4
>>   /
>>
>> The code will stop at a certain point (in my case the 4th q point). If I
>> now restart the calculation
>> using:
>>
>> Electron-phonon coefficients for Al
>>   &inputph
>>    tr2_ph=1.0d-10,
>>    prefix='aluminum',
>>    fildvscf='aldv',
>>    amass(1)=26.98,
>>    outdir='./tempdir/',
>>    fildyn='al.dyn',
>>    electron_phonon='interpolated',
>>    el_ph_sigma=0.005,
>>    el_ph_nsigma=10,
>>    recover=.true.
>>    trans=.false.,
>>    ldisp=.true.
>> !  max_seconds=6,
>>    nq1=4, nq2=4, nq3=4
>>   /
>>
>> I get (again) the error message "Possibly too few bands at point ..."
>> once the code wants to calculate
>> the wave functions for the 4th q point (the one it stopped before)...
>> All other points are fine...
>>
>>
>> I think that the whole problem is related to the storing of the wave
>> functions and the charge density.
>> Maybe I'm doing something really wrong, but I don't see any obvious
>> error in the input... Also I don't
>> see any input variable for ph which influences the saving of wave
>> functions...
>>
>> Regards
>>
>> Thomas
>>
>> --
>> Dr. rer. nat. Thomas Brumme
>> Max Planck Institute for the Structure and Dynamics of Matter
>> Luruper Chaussee 149
>> 22761 Hamburg
>>
>> Tel:  +49 (0)40 8998 6557
>>
>> email: Thomas.Brumme at mpsd.mpg.de
>>
>> _______________________________________________
>> Q-e-developers mailing list
>> Q-e-developers at qe-forge.org
>> http://qe-forge.org/mailman/listinfo/q-e-developers
>>
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
>
>
> _______________________________________________
> Q-e-developers mailing listQ-e-developers at qe-forge.orghttp://qe-forge.org/mailman/listinfo/q-e-developers
>
>
> --
> Dr. rer. nat. Thomas Brumme
> Max Planck Institute for the Structure and Dynamics of Matter
> Luruper Chaussee 149
> 22761 Hamburg
>
> Tel:  +49 (0)40 8998 6557
>
> email: Thomas.Brumme at mpsd.mpg.de
>
>
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers
>
>


-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20160923/874638d1/attachment.html>


More information about the developers mailing list