[Pw_forum] Na3Bi calculation failure
Paolo Giannozzi
p.giannozzi at gmail.com
Wed Oct 21 11:27:21 CEST 2015
Unless you need new developments that are available in the svn version
only, please try if it works with the 5.2.0 version. We just found a
problem (also affecting v.5.2.1) with "task groups" that may lead to
strange crashes.
Paolo
On Wed, Oct 21, 2015 at 11:04 AM, Kane O'Donnell <kane.odonnell at gmail.com>
wrote:
>
> Hi all,
>
> Wondering if I can get some help trying to diagnose a crash. I’m running
> the SVN latest on a Cray XC40 (Magnus -
> https://www.pawsey.org.au/our-systems/magnus-technical-specifications/).
> Usually no problems, but I have difficulties getting the attached slab
> calculation to run past the first few davidson diagonalizations. It’s a 3x3
> c-oriented slab of Na3Bi, the minimum I can use to capture a certain
> adsorbate reconstruction (this just the bare slab). It’s only a 72 atom
> cell but Bi has a lot of electrons (I think there’s about ~800 electrons
> and ~900 bands). I have spin-orbit coupling switched on (important for this
> solid), and I have been able to do calculations on the smaller unit cell
> using the library pseudopotentials listed in the species block.
> Calculations on systems of this size (e.g. O(1000) electrons, bands) are
> routine on Magnus, so I think I’m probably just doing something stupid but
> can’t seem to figure it out.
>
> Typical run conditions are with 384 processors (16 nodes, 24 cores), with
> -nk 3 -ndiag 100 -ntg 8. Moving down to ~12 nodes leads to a out of memory
> crash just as the code reports it is allocating random wf’s at the
> beginning. From 16 nodes and upwards, the crashes happen around the
> diagonalization step. Switching to CG works but the slowdown is
> astronomical (~10000 seconds per SCF step, not feasible for a relaxation).
> A typical output is attached. With -ndiag > 1, the error is “problems
> computing cholesky”, with -ndiag = 1, the error is "S matrix not positive
> definite”, both from cdiaghg. A search of the forums suggests this issue
> comes up every now and then on wildly different systems and is usually
> blamed on the user/compiler/lapack/blas/scalapack. So, details: QE was
> compiled by me on Magnus with PrgEnv-gnu (fortran 4.9.0) against the Cray
> libsci (includes fftw, scalapack, etc), with:
>
> ./configure —enable-parallel —with-scalapack=yes FC=ftn CC=cc
>
> and all tests are passed with no problems.
>
> Any ideas? Let me know if there is any further information necessary.
>
> Best regards,
>
> Kane
>
> *Kane O'Donnell*
> *Postdoctoral Research Fellow | Department of Physics, Astronomy and
> Medical Radiation Science*
>
> *Curtin University*
> *Tel |* +61 8 9266 1381
> *Fax |* +61 8 9266 2377
>
> *Email |* kane.odonnell at curtin.edu.au
>
>
>
>
> Curtin University is a trademark of Curtin University of Technology
> CRICOS Provider Code 00301J
>
> &control
> calculation = 'relax',
> title = '',
> outdir = './',
> prefix = 'Na3Bi_331',
> pseudo_dir = '/group/partner1197/kodonnell/qe_pseudos/PSEUDOPOTENTIALS/',
> wf_collect = .true.
> /
> &system
> ibrav = 0,
> nat = 72,
> ntyp = 2,
> nbnd = 896,
> ecutwfc = 50,
> !ecutrho = 280,
> !tot_charge=+1.0,
> occupations = 'smearing',
> smearing = 'mv',
> degauss = 0.0073,
> lspinorb = .true.,
> noncolin = .true.,
> starting_magnetization(1) = 0.0,
> starting_magnetization(2) = 0.0
> /
> &electrons
> conv_thr = 1.0D-7
> /
> &ions
> /
> ATOMIC_SPECIES
> Bi 1.0 Bi.rel-pbe-dn-kjpaw_psl.0.2.2.UPF
> Na 1.0 Na.rel-pbe-spn-kjpaw_psl.0.2.UPF
> CELL_PARAMETERS angstrom
> 16.344 0 0
> -8.172 14.1543 0
> 1.77359e-15 3.07196e-15 28.965
> K_POINTS automatic
> 2 2 1 0 0 0
> ATOMIC_POSITIONS angstrom
> Bi 0 3.146 2.414 0 0 0
> Na 2.724 4.718 2.414 0 0 0
> Na 0 3.146 5.629
> Bi 2.724 1.573 7.241
> Na 2.724 4.718 7.241
> Na 2.724 1.573 0.801 0 0 0
> Na 2.724 1.573 4.026
> Na 0 3.146 8.854
> Bi -2.724 7.864 2.414 0 0 0
> Na 0 9.436 2.414 0 0 0
> Na -2.724 7.864 5.629
> Bi 0 6.291 7.241
> Na 0 9.436 7.241
> Na 0 6.291 0.801 0 0 0
> Na 0 6.291 4.026
> Na -2.724 7.864 8.854
> Bi -5.448 12.582 2.414 0 0 0
> Na -2.724 14.154 2.414 0 0 0
> Na -5.448 12.582 5.629
> Bi -2.724 11.009 7.241
> Na -2.724 14.154 7.241
> Na -2.724 11.009 0.801 0 0 0
> Na -2.724 11.009 4.026
> Na -5.448 12.582 8.854
> Bi 5.448 3.146 2.414 0 0 0
> Na 8.172 4.718 2.414 0 0 0
> Na 5.448 3.146 5.629
> Bi 8.172 1.573 7.241
> Na 8.172 4.718 7.241
> Na 8.172 1.573 0.801 0 0 0
> Na 8.172 1.573 4.026
> Na 5.448 3.146 8.854
> Bi 2.724 7.864 2.414 0 0 0
> Na 5.448 9.436 2.414 0 0 0
> Na 2.724 7.864 5.629
> Bi 5.448 6.291 7.241
> Na 5.448 9.436 7.241
> Na 5.448 6.291 0.801 0 0 0
> Na 5.448 6.291 4.026
> Na 2.724 7.864 8.854
> Bi 0 12.582 2.414 0 0 0
> Na 2.724 14.154 2.414 0 0 0
> Na 0 12.582 5.629
> Bi 2.724 11.009 7.241
> Na 2.724 14.154 7.241
> Na 2.724 11.009 0.801 0 0 0
> Na 2.724 11.009 4.026
> Na 0 12.582 8.854
> Bi 10.896 3.146 2.414 0 0 0
> Na 13.62 4.718 2.414 0 0 0
> Na 10.896 3.146 5.629
> Bi 13.62 1.573 7.241
> Na 13.62 4.718 7.241
> Na 13.62 1.573 0.801 0 0 0
> Na 13.62 1.573 4.026
> Na 10.896 3.146 8.854
> Bi 8.172 7.864 2.414 0 0 0
> Na 10.896 9.436 2.414 0 0 0
> Na 8.172 7.864 5.629
> Bi 10.896 6.291 7.241
> Na 10.896 9.436 7.241
> Na 10.896 6.291 0.801 0 0 0
> Na 10.896 6.291 4.026
> Na 8.172 7.864 8.854
> Bi 5.448 12.582 2.414 0 0 0
> Na 8.172 14.154 2.414 0 0 0
> Na 5.448 12.582 5.629
> Bi 8.172 11.009 7.241
> Na 8.172 14.154 7.241
> Na 8.172 11.009 0.801 0 0 0
> Na 8.172 11.009 4.026
> Na 5.448 12.582 8.854
>
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
--
Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20151021/6b3992dc/attachment.html>
More information about the users
mailing list