[Pw_forum] pw.x stops seemingly randomly with no warning when northo > 1

Andrew Supka supka1ar at cmich.edu
Sun Jan 28 08:12:36 CET 2018


My make.inc is attached for some details on how I compiled QE.

pw.x runs fine then will stop running with no visible warning in stdout. I
suspect it's either scalapack, the cluster environment, or a combination of
both. I haven't noticed any pattern in if or when pw.x stops in this way.

I've only seen this happen when running large-ish jobs of several hundred
atoms or more on multiple computing nodes when using northo larger than 1
(usually I'll set northo to 16,25,36, or 64..etc depending on the number of
procs and nodes)

I'm really only guessing but it might be some kind of communication issue
between nodes that causes scalapack error out and halt pw.x

Has anyone else come across this behavior?  I've noticed it across multiple
versions of QE and intel mkl/compilers.

Andrew Supka
PhD Student
Central Michigan University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180128/3e5f418b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.inc
Type: application/octet-stream
Size: 6659 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180128/3e5f418b/attachment.obj>


More information about the users mailing list