[Pw_forum] Na3Bi calculation failure

Kane O'Donnell kane.odonnell at gmail.com
Wed Oct 21 11:04:22 CEST 2015


Hi all,

Wondering if I can get some help trying to diagnose a crash. I’m running the SVN latest on a Cray XC40 (Magnus - https://www.pawsey.org.au/our-systems/magnus-technical-specifications/ <https://www.pawsey.org.au/our-systems/magnus-technical-specifications/>). Usually no problems, but I have difficulties getting the attached slab calculation to run past the first few davidson diagonalizations. It’s a 3x3 c-oriented slab of Na3Bi, the minimum I can use to capture a certain adsorbate reconstruction (this just the bare slab). It’s only a 72 atom cell but Bi has a lot of electrons (I think there’s about ~800 electrons and ~900 bands). I have spin-orbit coupling switched on (important for this solid), and I have been able to do calculations on the smaller unit cell using the library pseudopotentials listed in the species block. Calculations on systems of this size (e.g. O(1000) electrons, bands) are routine on Magnus, so I think I’m probably just doing something stupid but can’t seem to figure it out.

Typical run conditions are with 384 processors (16 nodes, 24 cores), with -nk 3 -ndiag 100 -ntg 8. Moving down to ~12 nodes leads to a out of memory crash just as the code reports it is allocating random wf’s at the beginning. From 16 nodes and upwards, the crashes happen around the diagonalization step. Switching to CG works but the slowdown is astronomical (~10000 seconds per SCF step, not feasible for a relaxation). A typical output is attached. With -ndiag > 1, the error is “problems computing cholesky”, with -ndiag = 1, the error is "S matrix not positive definite”, both from cdiaghg. A search of the forums suggests this issue comes up every now and then on wildly different systems and is usually blamed on the user/compiler/lapack/blas/scalapack. So, details: QE was compiled by me on Magnus with PrgEnv-gnu (fortran 4.9.0) against the Cray libsci (includes fftw, scalapack, etc), with:

./configure —enable-parallel —with-scalapack=yes FC=ftn CC=cc 

and all tests are passed with no problems.

Any ideas? Let me know if there is any further information necessary.

Best regards,

Kane
Kane O'Donnell
Postdoctoral Research Fellow | Department of Physics, Astronomy and Medical Radiation Science

Curtin University
Tel | +61 8 9266 1381 
Fax | +61 8 9266 2377  

Email | kane.odonnell at curtin.edu.au <file:///D/Documents%20and%20Settings/216283I/Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/LSWE2GO4/yournamehere@curtin.edu.au>



Curtin University is a trademark of Curtin University of Technology
CRICOS Provider Code 00301J

&control
  calculation = 'relax',
  title = '',
  outdir = './',
  prefix = 'Na3Bi_331',
  pseudo_dir = '/group/partner1197/kodonnell/qe_pseudos/PSEUDOPOTENTIALS/',
  wf_collect = .true.
/
&system
  ibrav = 0,
  nat = 72,
  ntyp = 2,
  nbnd = 896,
  ecutwfc = 50,
  !ecutrho = 280,
  !tot_charge=+1.0,
  occupations = 'smearing',
  smearing = 'mv',
  degauss = 0.0073,
  lspinorb = .true.,
  noncolin = .true.,
  starting_magnetization(1) = 0.0,
  starting_magnetization(2) = 0.0
/
&electrons
  conv_thr = 1.0D-7
/
&ions
/
ATOMIC_SPECIES
  Bi 1.0 Bi.rel-pbe-dn-kjpaw_psl.0.2.2.UPF
  Na 1.0 Na.rel-pbe-spn-kjpaw_psl.0.2.UPF
CELL_PARAMETERS angstrom
  16.344 0 0
  -8.172 14.1543 0
  1.77359e-15 3.07196e-15 28.965
K_POINTS automatic
  2 2 1 0 0 0
ATOMIC_POSITIONS angstrom
Bi    0    3.146    2.414    0    0    0
Na    2.724    4.718    2.414    0    0    0
Na    0    3.146    5.629
Bi    2.724    1.573    7.241
Na    2.724    4.718    7.241
Na    2.724    1.573    0.801    0    0    0
Na    2.724    1.573    4.026
Na    0    3.146    8.854
Bi    -2.724    7.864    2.414    0    0    0
Na    0    9.436    2.414    0    0    0
Na    -2.724    7.864    5.629
Bi    0    6.291    7.241
Na    0    9.436    7.241
Na    0    6.291    0.801    0    0    0
Na    0    6.291    4.026
Na    -2.724    7.864    8.854
Bi    -5.448    12.582    2.414    0    0    0
Na    -2.724    14.154    2.414    0    0    0
Na    -5.448    12.582    5.629
Bi    -2.724    11.009    7.241
Na    -2.724    14.154    7.241
Na    -2.724    11.009    0.801    0    0    0
Na    -2.724    11.009    4.026
Na    -5.448    12.582    8.854
Bi    5.448    3.146    2.414    0    0    0
Na    8.172    4.718    2.414    0    0    0
Na    5.448    3.146    5.629
Bi    8.172    1.573    7.241
Na    8.172    4.718    7.241
Na    8.172    1.573    0.801    0    0    0
Na    8.172    1.573    4.026
Na    5.448    3.146    8.854
Bi    2.724    7.864    2.414    0    0    0
Na    5.448    9.436    2.414    0    0    0
Na    2.724    7.864    5.629
Bi    5.448    6.291    7.241
Na    5.448    9.436    7.241
Na    5.448    6.291    0.801    0    0    0
Na    5.448    6.291    4.026
Na    2.724    7.864    8.854
Bi    0    12.582    2.414    0    0    0
Na    2.724    14.154    2.414    0    0    0
Na    0    12.582    5.629
Bi    2.724    11.009    7.241
Na    2.724    14.154    7.241
Na    2.724    11.009    0.801    0    0    0
Na    2.724    11.009    4.026
Na    0    12.582    8.854
Bi    10.896    3.146    2.414    0    0    0
Na    13.62    4.718    2.414    0    0    0
Na    10.896    3.146    5.629
Bi    13.62    1.573    7.241
Na    13.62    4.718    7.241
Na    13.62    1.573    0.801    0    0    0
Na    13.62    1.573    4.026
Na    10.896    3.146    8.854
Bi    8.172    7.864    2.414    0    0    0
Na    10.896    9.436    2.414    0    0    0
Na    8.172    7.864    5.629
Bi    10.896    6.291    7.241
Na    10.896    9.436    7.241
Na    10.896    6.291    0.801    0    0    0
Na    10.896    6.291    4.026
Na    8.172    7.864    8.854
Bi    5.448    12.582    2.414    0    0    0
Na    8.172    14.154    2.414    0    0    0
Na    5.448    12.582    5.629
Bi    8.172    11.009    7.241
Na    8.172    14.154    7.241
Na    8.172    11.009    0.801    0    0    0
Na    8.172    11.009    4.026
Na    5.448    12.582    8.854



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20151021/9392a1a0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sigCurtin.png
Type: image/png
Size: 4353 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20151021/9392a1a0/attachment.png>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20151021/9392a1a0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Na3Bi_331.relax.out
Type: application/octet-stream
Size: 13750 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20151021/9392a1a0/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20151021/9392a1a0/attachment-0002.html>


More information about the users mailing list