[Pw_forum] parallel diag. "failure" with too large number of k-points?????

Giovanni Cantele giovanni.cantele at na.infn.it
Thu Jan 14 13:09:09 CET 2010


Dear all,

I'm doing some test runs on bulk silicon, using QE 4.0.5. I need to calculate the eigenvalues on dense k-points grids,
to test convergence of some properties. So, I did a simple scf + nscf runs with with increasing (automatic) k-point grid
(from 8 8 8    1 1 1 to 24 24 24    1 1 1). The calculation is parallel and uses 8 CPUs.

All the grids with a number of k-points > 804 show a strange behaviour, namely, that once reached the 804th k-point
(so I see in the output the "Computing kpt #:   804" line) the calculation remains running but no update of output files is done.

Has anybody experienced such kind of problem?


Just to help (me and you) to understand what is going on, I did some debug:

i) the problem disappears if I use -ndiag 1
(consider that if not specified, by default the parallel algorithm is used, the output header contains
Iterative solution of the eigenvalue problem

     a parallel distributed memory algorithm will be used,
     eigenstates matrixes will be distributed block like on
     ortho sub-group =    2*   2 procs
)
Of course, I don't know if in this case (that is, with -ndiag 1) the problem could appear with much denser grids (e.g. 64 64 64    1 1 1), but
I would say not

ii) the problem is maybe a memory (allocation/deallocation?????) issue because (see also point iv), if the parallel algorithm is used (which is what the code
chooses by default), the problem also disappears on decreasing the cutoff from 30 Ry to 15 Ry

iii) the code stops (in the sense that keeps running without doing anything) in SUBROUTINE pcegterg (PW/cegterg.f90)
at the point 
CALL zsqmred( nbase, vl, desc_old( nlax_ ), desc_old, nbase+notcnv, hl, nx, desc )
(this means that the calculation of the 804th k-point executes the line immediately before, but not the one immediately after)
Of course, all this provided I did my debug correctly.

iv) using QE 4.1.2 the code runs 25% slower and stops exactly at the same k-point but in this case it doesn't keep running
(the code stops with a segmentation fault error)


Another issue I would like to point out is that, in the cases when the calculation finishes correctly, the one with "-ndiag 1" runs much
faster (not sure but maybe half the time), so it could be better to set the code in such a way that, in these "not-expensive" cases the
parallel diagonalization is disabled by default.

Giovanni



PS these are my input files:

>>>>>>>>>>>>>>>> Si.scf.in
&CONTROL
  calculation  = 'scf'
  title        = 'Si'
  restart_mode = 'from_scratch'
  outdir       = '/scratch/cantele/prova'
  prefix       = 'Si'
  pseudo_dir   = '/home/nm_settings/software/CODES/Quantum-ESPRESSO/pseudo'
  wf_collect   = .true.
  verbosity    = 'high'
/
&SYSTEM
  ibrav        = 2
  celldm(1)    = 10.20927
  nat          = 2
  ntyp         = 1
  ecutwfc      = 30.0
/
&ELECTRONS
  conv_thr     = 1.0d-8
  mixing_beta  = 0.7
/
ATOMIC_SPECIES
  Si    28.0855    Si.pz-vbc.UPF
ATOMIC_POSITIONS { alat }
  Si 0.00 0.00 0.00
  Si 0.25 0.25 0.25
K_POINTS { automatic }
8  8  8    0  0  0



>>>>>>>>>>>>>>>> Si.nscf.in
&CONTROL
  calculation  = 'nscf'
  title            = 'Si'
  restart_mode     = 'from_scratch'
  outdir           = '/scratch/cantele/prova'
  prefix           = 'Si'
  pseudo_dir       = '/home/nm_settings/software/CODES/Quantum-ESPRESSO/pseudo'
  wf_collect       = .true.
  verbosity        = 'high'
/
&SYSTEM
  ibrav            = 2
  celldm(1)        = 10.20927
  nat              = 2
  ntyp             = 1
  ecutwfc          = 30.0
  nbnd             = 60
/
&ELECTRONS
  diago_full_acc   = .true.
  diago_thr_init   = 1.0d-6
/
ATOMIC_SPECIES
  Si    28.0855    Si.pz-vbc.UPF
ATOMIC_POSITIONS { alat }
  Si 0.00 0.00 0.00
  Si 0.25 0.25 0.25
K_POINTS { automatic }
24  24  24    1  1  1

--

Dr. Giovanni Cantele
Coherentia CNR-INFM and Dipartimento di Scienze Fisiche
Universita' di Napoli "Federico II"
Complesso Universitario di Monte S. Angelo - Ed. 6
Via Cintia, I-80126, Napoli, Italy
Phone: +39 081 676910
Fax:   +39 081 676346
E-mail: giovanni.cantele at cnr.it
              giovanni.cantele at na.infn.it
Web: http://people.na.infn.it/~cantele
Research Group: http://www.nanomat.unina.it
Skype contact: giocan74

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20100114/83e2953d/attachment.html>


More information about the users mailing list