[Q-e-developers] bug image parallelization in PHonon

Thomas Brumme thomas.brumme at mpsd.mpg.de
Wed Sep 14 13:30:55 CEST 2016


Dear all,

I think I found a bug in the image parallelization of PH - or I'm doing 
something wrong.
I used the version 5.4 but the problem is also there if I use the 6.0 beta.
Maybe someone remembers my email few days ago to the normal email list 
concerning
the parallelization using the GRID technique - the problem I encounter 
here is essentially
the same. As an example, I use a modified run_example_1 of the 
Recover_example
directory of PH.

Description of the problem:

0. (Following the example) I did an scf calculation using 2 CPUs with:

  &control
     calculation='scf'
     restart_mode='from_scratch',
     prefix='aluminum',
     pseudo_dir = './',
     outdir='./tempdir/'
  /
  &system
     ibrav=  2, celldm(1) =7.5, nat= 1, ntyp= 1,
     ecutwfc =15.0,
     occupations='smearing', smearing='methfessel-paxton', degauss=0.05,
     la2F = .true.,
  /
  &electrons
     conv_thr =  1.0d-8
     mixing_beta = 0.7
  /
ATOMIC_SPECIES
  Al  26.98 Al.pz-vbc.UPF
ATOMIC_POSITIONS
  Al 0.00 0.00 0.00
K_POINTS {automatic}
  16 16 16  0 0 0


1. I'll do the scf calculation using 2 CPUS and:

  &control
     calculation='scf'
     restart_mode='from_scratch',
     prefix='aluminum',
     pseudo_dir = './',
     outdir='./tempdir/'
  /
  &system
     ibrav=  2, celldm(1) =7.5, nat= 1, ntyp= 1,
     ecutwfc =15.0,
     occupations='smearing', smearing='methfessel-paxton', degauss=0.05
  /
  &electrons
     conv_thr =  1.0d-8
     mixing_beta = 0.7
  /
ATOMIC_SPECIES
  Al  26.98 Al.pz-vbc.UPF
ATOMIC_POSITIONS
  Al 0.00 0.00 0.00
K_POINTS {automatic}
  8 8 8  0 0 0


2. I'll do a phonon calculation including storing the dvscf files and 
using images.
More specifically I used:

mpirun -np 4 ph.x -ni 2 < al.elph.in

with al.elph.in given by:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
!  electron_phonon='interpolated',
!  el_ph_sigma=0.005,
!  el_ph_nsigma=10,
!  recover=.true.
!  trans=.false.,
   ldisp=.true.
   max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /

I used max_seconds in order to simulate the finite run time we have on 
our HPC.
Restarting with recover=.true. works fine... I.e. I used:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
!  electron_phonon='interpolated',
!  el_ph_sigma=0.005,
!  el_ph_nsigma=10,
   recover=.true.
!  trans=.false.,
   ldisp=.true.
   max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /


3. Now I want to collect all data using no images:

mpirun -np 2 ph.x < al.elph.in

with the same input file as given in 2.

I'll get the error "Possibly too few bands at point ..." once the code 
wants to
recalculate the wave functions for the q points which were calculated 
only on
the second image, i.e., for q points 6, 7, and 8.

If I check the charge_density.dat files in the subfolders of the q 
points in the
_ph0 directory I find that they're empty. Thus, I copied the q 
subfolders of the
second image by hand to the folder of the first image using:

cp -r _ph1/aluminum.q_* _ph0/

If I now restart without images, using the input of 2. it works... 
Everything is fine...


4. Now I can also calculate the el-ph parameters using the input:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
   electron_phonon='interpolated',
   el_ph_sigma=0.005,
   el_ph_nsigma=10,
!  recover=.true.
   trans=.false.,
   ldisp=.true.
!  max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /


5. Another problem I encounter is the following... Suppose the run time 
is not enough to
finish the el-ph calculations, i.e., instead of the input in 4. I use:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
   electron_phonon='interpolated',
   el_ph_sigma=0.005,
   el_ph_nsigma=10,
!  recover=.true.
   trans=.false.,
   ldisp=.true.
   max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /

The code will stop at a certain point (in my case the 4th q point). If I 
now restart the calculation
using:

Electron-phonon coefficients for Al
  &inputph
   tr2_ph=1.0d-10,
   prefix='aluminum',
   fildvscf='aldv',
   amass(1)=26.98,
   outdir='./tempdir/',
   fildyn='al.dyn',
   electron_phonon='interpolated',
   el_ph_sigma=0.005,
   el_ph_nsigma=10,
   recover=.true.
   trans=.false.,
   ldisp=.true.
!  max_seconds=6,
   nq1=4, nq2=4, nq3=4
  /

I get (again) the error message "Possibly too few bands at point ..." 
once the code wants to calculate
the wave functions for the 4th q point (the one it stopped before)... 
All other points are fine...


I think that the whole problem is related to the storing of the wave 
functions and the charge density.
Maybe I'm doing something really wrong, but I don't see any obvious 
error in the input... Also I don't
see any input variable for ph which influences the saving of wave 
functions...

Regards

Thomas

-- 
Dr. rer. nat. Thomas Brumme
Max Planck Institute for the Structure and Dynamics of Matter
Luruper Chaussee 149
22761 Hamburg

Tel:  +49 (0)40 8998 6557

email: Thomas.Brumme at mpsd.mpg.de




More information about the developers mailing list