[Q-e-developers] bug image parallelization in PHonon

Brumme, Thomas thomas.brumme at mpsd.mpg.de
Fri Sep 23 19:52:04 CEST 2016


________________________________________
Von: q-e-developers-bounces at qe-forge.org [q-e-developers-bounces at qe-forge.org]" im Auftrag von "Paolo Giannozzi [p.giannozzi at gmail.com]
Gesendet: Freitag, 23. September 2016 18:21
An: General discussion list for Quantum ESPRESSO developers
Betreff: Re: [Q-e-developers] bug image parallelization in PHonon

Is the following patch doing (that is: stopping) the job?
---
Index: /home/giannozz/trunk/espresso/PHonon/PH/phq_readin.f90
===================================================================
--- /home/giannozz/trunk/espresso/PHonon/PH/phq_readin.f90    (revision 13008)
+++ /home/giannozz/trunk/espresso/PHonon/PH/phq_readin.f90    (working copy)
@@ -679,6 +679,8 @@

   IF(elph.and.nimage>1) call errore('phq_readin',&
        'el-ph with images not implemented',1)
+  IF( fildvscf /= ' ' .and. nimage > 1 ) call errore('phq_readin',&
+       'saving dvscf to file images not implemented',1)

   IF (elph.OR.fildvscf /= ' ') lqdir=.TRUE.
---

Paolo

On Fri, Sep 23, 2016 at 4:45 PM, Thomas Brumme <thomas.brumme at mpsd.mpg.de<mailto:thomas.brumme at mpsd.mpg.de>> wrote:

I meanwhile had a discussion with Lorenzo Paulatto about a similar problem.

I think that it might be a rather specific problem. As soon as I parallelize only over
q points using start_q and last_q there is no problem - also for restarting.

Using images I can, in principle, even create the full dvscf files, without having to
rerun the calculation without images, using split and cat on the different dvscf files
in the different temp folders. It's tedious but it works. Yet, in future I will use only
the parallelization over q points for the calculation of the dvscf.

In summary, the parallelization for PH is not straightforward and I think that it
might help to store, e.g., the dvscf files for different representations separately.
But Lorenzo mentioned that system administrators complain if the number of
written files is large... It could be helpful if there would be a kind of summary
what can be done using images and what not... I.e. dvscf (and el-ph) does not
work if image parallelization is used, especially if the different representations
of one q point are split across different images. For el-ph the code does not
start, but maybe a similar check can be added for the dvscf files?

Well, or maybe not, I don't know :)

On 09/23/2016 04:24 PM, Paolo Giannozzi wrote:
has anybody any idea? P.

On Wed, Sep 14, 2016 at 1:30 PM, Thomas Brumme <thomas.brumme at mpsd.mpg.de<mailto:thomas.brumme at mpsd.mpg.de>> wrote:
Dear all,

I think I found a bug in the image parallelization of PH - or I'm doing
something wrong.
I used the version 5.4 but the problem is also there if I use the 6.0 beta.
Maybe someone remembers my email few days ago to the normal email list
concerning
the parallelization using the GRID technique - the problem I encounter
here is essentially
the same. As an example, I use a modified run_example_1 of the
Recover_example
directory of PH.

Description of the problem:

0. (Following the example) I did an scf calculation using 2 CPUs with:

  &control
     calculation='scf'
     restart_mode='from_scratch',
     prefix='aluminum',
     pseudo_dir = './',
     outdir='./tempdir/'
  /
  &system
     ibrav=  2, celldm(1) =7.5, nat= 1, ntyp= 1,
     ecutwfc =15.0,
     occupations='smearing', smearing='methfessel-paxton', degauss=0.05,
     la2F = .true.,
  /
  &electrons
     conv_thr =  1.0d-8
     mixing_beta = 0.7
  /
ATOMIC_SPECIES
  Al  26.98 Al.pz-vbc.UPF
ATOMIC_POSITIONS
  Al 0.00 0.00 0.00
K_POINTS {automatic}
  16 16 16  0 0 0


1. I'll do the scf calculation using 2 CPUS and:

  &control
     calculation='scf'
     restart_mode='from_scratch',
     prefix='aluminum',
     pseudo_dir = './',
     outdir='./tempdir/'
  /
  &system
     ibrav=  2, celldm(1) =7.5, nat= 1, ntyp= 1,
     ecutwfc =15.0,
     occupations='smearing', smearing='methfessel-paxton', degauss=0.05
  /
  &electrons
     conv_thr =  1.0d-8
     mixing_beta = 0.7
  /
ATOMIC_SPECIES
  Al  26.98 Al.pz-vbc.UPF
ATOMIC_POSITIONS
  Al 0.00 0.00 0.00
K_POINTS {automatic}
  8 8 8  0 0 0


2. I'll do a phonon calculation including storing the dvscf files and
using images.
More specifically I used:

mpirun -np 4 ph.x -ni 2 < al.elph.in<http://al.elph.in>

with al.elph.in<http://al.elph.in> given by:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
!  electron_phonon='interpolated',
!  el_ph_sigma=0.005,
!  el_ph_nsigma=10,
!  recover=.true.
!  trans=.false.,
   ldisp=.true.
   max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /

I used max_seconds in order to simulate the finite run time we have on
our HPC.
Restarting with recover=.true. works fine... I.e. I used:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
!  electron_phonon='interpolated',
!  el_ph_sigma=0.005,
!  el_ph_nsigma=10,
   recover=.true.
!  trans=.false.,
   ldisp=.true.
   max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /


3. Now I want to collect all data using no images:

mpirun -np 2 ph.x < al.elph.in<http://al.elph.in>

with the same input file as given in 2.

I'll get the error "Possibly too few bands at point ..." once the code
wants to
recalculate the wave functions for the q points which were calculated
only on
the second image, i.e., for q points 6, 7, and 8.

If I check the charge_density.dat files in the subfolders of the q
points in the
_ph0 directory I find that they're empty. Thus, I copied the q
subfolders of the
second image by hand to the folder of the first image using:

cp -r _ph1/aluminum.q_* _ph0/

If I now restart without images, using the input of 2. it works...
Everything is fine...


4. Now I can also calculate the el-ph parameters using the input:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
   electron_phonon='interpolated',
   el_ph_sigma=0.005,
   el_ph_nsigma=10,
!  recover=.true.
   trans=.false.,
   ldisp=.true.
!  max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /


5. Another problem I encounter is the following... Suppose the run time
is not enough to
finish the el-ph calculations, i.e., instead of the input in 4. I use:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
   electron_phonon='interpolated',
   el_ph_sigma=0.005,
   el_ph_nsigma=10,
!  recover=.true.
   trans=.false.,
   ldisp=.true.
   max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /

The code will stop at a certain point (in my case the 4th q point). If I
now restart the calculation
using:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
   electron_phonon='interpolated',
   el_ph_sigma=0.005,
   el_ph_nsigma=10,
   recover=.true.
   trans=.false.,
   ldisp=.true.
!  max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /

I get (again) the error message "Possibly too few bands at point ..."
once the code wants to calculate
the wave functions for the 4th q point (the one it stopped before)...
All other points are fine...


I think that the whole problem is related to the storing of the wave
functions and the charge density.
Maybe I'm doing something really wrong, but I don't see any obvious
error in the input... Also I don't
see any input variable for ph which influences the saving of wave
functions...

Regards

Thomas

--
Dr. rer. nat. Thomas Brumme
Max Planck Institute for the Structure and Dynamics of Matter
Luruper Chaussee 149
22761 Hamburg

Tel:  +49 (0)40 8998 6557<tel:%2B49%20%280%2940%208998%206557>

email: Thomas.Brumme at mpsd.mpg.de<mailto:Thomas.Brumme at mpsd.mpg.de>

_______________________________________________
Q-e-developers mailing list
Q-e-developers at qe-forge.org<mailto:Q-e-developers at qe-forge.org>
http://qe-forge.org/mailman/listinfo/q-e-developers



--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216<tel:%2B39-0432-558216>, fax +39-0432-558222<tel:%2B39-0432-558222>



I can't test it now as I'm in a train with no WiFi but I will do it Monday... But looking at the code I would say yes ;)
_______________________________________________
Q-e-developers mailing list
Q-e-developers at qe-forge.org<mailto:Q-e-developers at qe-forge.org>
http://qe-forge.org/mailman/listinfo/q-e-developers



--
Dr. rer. nat. Thomas Brumme
Max Planck Institute for the Structure and Dynamics of Matter
Luruper Chaussee 149
22761 Hamburg

Tel:  +49 (0)40 8998 6557<tel:%2B49%20%280%2940%208998%206557>

email: Thomas.Brumme at mpsd.mpg.de<mailto:Thomas.Brumme at mpsd.mpg.de>


_______________________________________________
Q-e-developers mailing list
Q-e-developers at qe-forge.org<mailto:Q-e-developers at qe-forge.org>
http://qe-forge.org/mailman/listinfo/q-e-developers




--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222





More information about the developers mailing list