[Pw_forum] [Fwd: diagonalization failure (david, cg) for large numbers of bands]

Vivek Ranjan vranjan at ncsu.edu
Tue Dec 1 21:17:49 CET 2009


Hi,

## Summary:  Running pw.x on 128-1024 processors, testing bulk 64-Si cell
at gamma
(gamma tricks not used because of incompatibility with subsequent
calculations) with
a "large" number of (extra) bands.  No problems reported when nbnd is
small.  With
128-256 processors, when nbnd>1300, if using Davidson diag, program exits
before
completion of 1 scf step, with cholesky decomposition failure error; if using
iterative diag (cg), fails at same stage with error "(ZHEGV*) failed". 
System is
Cray XT4.

## Purpose:  reproducing the beautiful results of PHYSICAL REVIEW B 79,
201104, 2009
for GWW education purposes.  :)

## Background:  I have found similar-looking problems reported here, and
have tried
several of the recommendations (switching to ndiag 1 at runtime to use
serial diag
instead of parallel; switching from david to cg).

In addition, I have tried increasing the PW cutoff (to provide more PWs
relative to
requested bands for the sake of Davidson diag, but this does not really
help).

I also attempted to do a regular SCF calculation with no nbnd specification,
followed by a NSCF calculation with extra bands specified.  The same
errors are
obtained.

## Current status:  I am now trying to rule out memory-related errors (via
running
on more nodes), and will update this thread accordingly if the problem is
related to
memory requirements.  Running on 512 processors permitted nbnd=2500
(converged
results should require ~3300 bands for this particular calculation,
according to my
understanding of the noted paper), and I have some 1024 processor runs
queued up.

It does not seem to me that such a system, even with so many states,
should have
such large memory demands, so am wondering if I am doing something
stupendously
wrong (or perhaps not exactly doing something wrong, but failing to do
something
glaringly obvious that would solve the problem).  Below is my input file,
followed
by some brief technical specs in case such are helpful.

## Sample input file:

&control
 calculation='scf'
 restart_mode='from_scratch',
 prefix='si'
 outdir='/scr/josepht/espresso/bsi64/Large_GAMMA/STEP_B/tmp'
 pseudo_dir='/scr/josepht/espresso/bsi64/pseudo'
/
&system
 ibrav= 8,
 celldm(1)= 20.52,
 celldm(2)= 1,
 celldm(3)=1,
 nat=  64,
 ntyp= 1,
 ecutwfc = 35.0,
 nosym=.true.
 nbnd = 3328,
/
&electrons
 diagonalization='david',
 conv_thr =  1.0d-8,
 mixing_beta = 0.5,
/
ATOMIC_SPECIES
Si  1. Si.pbe-rrkj.UPF
ATOMIC_POSITIONS (bohr)
Si      0.00000000        0.00000000        0.00000000
Si      5.13000000        5.13000000        0.00000000
Si      0.00000000        5.13000000        5.13000000
Si      5.13000000        0.00000000        5.13000000
Si      2.56500000        2.56500000        2.56500000
Si      7.69500000        7.69500000        2.56500000
Si      7.69500000        2.56500000        7.69500000
Si      2.56500000        7.69500000        7.69500000
Si     10.26000000        0.00000000        0.00000000
Si     15.39000000        5.13000000        0.00000000
Si     10.26000000        5.13000000        5.13000000
Si     15.39000000        0.00000000        5.13000000
Si     12.82500000        2.56500000        2.56500000
Si     17.95500000        7.69500000        2.56500000
Si     17.95500000        2.56500000        7.69500000
Si     12.82500000        7.69500000        7.69500000
Si      0.00000000       10.26000000        0.00000000
Si      5.13000000       15.39000000        0.00000000
Si      0.00000000       15.39000000        5.13000000
Si      5.13000000       10.26000000        5.13000000
Si      2.56500000       12.82500000        2.56500000
Si      7.69500000       17.95500000        2.56500000
Si      7.69500000       12.82500000        7.69500000
Si      2.56500000        7.69500000        7.69500000
Si     10.26000000        0.00000000        0.00000000
Si     15.39000000        5.13000000        0.00000000
Si     10.26000000        5.13000000        5.13000000
Si     15.39000000        0.00000000        5.13000000
Si     12.82500000        2.56500000        2.56500000
Si     17.95500000        7.69500000        2.56500000
Si     17.95500000        2.56500000        7.69500000
Si     12.82500000        7.69500000        7.69500000
Si      0.00000000       10.26000000        0.00000000
Si      5.13000000       15.39000000        0.00000000
Si      0.00000000       15.39000000        5.13000000
Si      5.13000000       10.26000000        5.13000000
Si      2.56500000       12.82500000        2.56500000
Si      7.69500000       17.95500000        2.56500000
Si      7.69500000       12.82500000        7.69500000
Si      2.56500000       17.95500000        7.69500000
Si      0.00000000        0.00000000       10.26000000
Si      5.13000000        5.13000000       10.26000000
Si      0.00000000        5.13000000       15.39000000
Si      5.13000000        0.00000000       15.39000000
Si      2.56500000        2.56500000       12.82500000
Si      7.69500000        7.69500000       12.82500000
Si      7.69500000        2.56500000       17.95500000
Si      2.56500000        7.69500000       17.95500000
Si     10.26000000       10.26000000        0.00000000
Si     15.39000000       15.39000000        0.00000000
Si     10.26000000       15.39000000        5.13000000
Si     15.39000000       10.26000000        5.13000000
Si     12.82500000       12.82500000        2.56500000
Si     17.95500000       17.95500000        2.56500000
Si     17.95500000       12.82500000        7.69500000
Si     12.82500000       17.95500000        7.69500000
Si     10.26000000        0.00000000       10.26000000
Si     15.39000000        5.13000000       10.26000000
Si     10.26000000        5.13000000       15.39000000
Si     15.39000000        0.00000000       15.39000000
Si     12.82500000        2.56500000       12.82500000
Si     17.95500000        7.69500000       12.82500000
Si     17.95500000        2.56500000       17.95500000
Si     12.82500000        7.69500000       17.95500000
Si      0.00000000       10.26000000       10.26000000
Si      5.13000000       15.39000000       10.26000000
Si      0.00000000       15.39000000       15.39000000
Si      5.13000000       10.26000000       15.39000000
Si      2.56500000       12.82500000       12.82500000
Si      7.69500000       17.95500000       12.82500000
Si      7.69500000       12.82500000       17.95500000
Si      2.56500000       17.95500000       17.95500000
Si     10.26000000       10.26000000       10.26000000
Si     15.39000000       15.39000000       10.26000000
Si     10.26000000       15.39000000       15.39000000
Si     15.39000000       10.26000000       15.39000000
Si     12.82500000       12.82500000       12.82500000
Si     17.95500000       17.95500000       12.82500000
Si     17.95500000       12.82500000       17.95500000
Si     12.82500000       17.95500000       17.95500000
K_POINTS
1
0.0 0.0 0.0 1.0

##END OF INPUT

The above file runs when nbnd = 1280 , and (possibly) relevant output from
the
successful run includes:

(Each subspace H/S matrix      400.00 Mb     (   5120,5120)

## Technical specs:  Code was compiled on a Cray XT4 (unsure if
compilation details
would be helpful), and runs were performed on Cray XT4 nodes with two
quad-core 2.3
GHz AMD Opteron processors with 16 GBytes of usable memory (requesting 4
cores per
node).

I've read here that the problem might be related to libraries/compilers
(issues with
PGI, ACML, etcetera)...if that is likely the case, I would be interested
in insight
regarding optimal compilation on Cray.

Thanks in advance for any assistance, and I apologize if this question has
essentially already been answered on the forum - I searched but did not
come across
an explicit solution to something matching this, though admit that the
general theme
is present in several independent threads.

Joseph Turnbull
Department of Physics
NC State University



More information about the users mailing list