[QE-users] Ph.x crashing on multiple nodes

Alexandr Fonari qe at afonari.com
Sat Jan 2 13:49:21 CET 2021


This might also happen if "wf_collect = .FALSE." in your scf input (.TRUE. is default).

On Sat, Jan 2, 2021, at 9:24 AM, Paolo Giannozzi wrote:
> It seems to me that your hack should work. Unfortunately the I/O of the 
> phonon code is quite confusing and it is hard to figure out what is not 
> working where and why. Apparently the code tries to read the charge 
> density of the unperturbed system, once again (it is read and stored in 
> memory at the very beginning) and from a different place from the 
> standard one.
> 
> Paolo
> 
> 
> On Mon, Dec 28, 2020 at 7:47 PM Baer, Bradly 
> <bradly.b.baer at vanderbilt.edu> wrote:
> > Hello all,
> > 
> >  I am experiencing a crash when working with ph.x across multiple nodes.  Input and output files are attached.  The first q-point appears to be calculated correctly, but the code crashes when attempting to start calculating the second q-point. A file "charge-density" is said to be missing but "charge-density.dat" exists when I manually inspect the files.  As there are 16 reports that the file cannot be found, I am assuming that this is an issue with me using multiple nodes (each node has 16 cores).  A general description of my computing environment and workflow follows:
> > 
> > I am using SLURM on a cluster.  I have two nodes assigned to my job, each with a local scratch drive that is not visible to the other node.  I also have access to a gpfs networked drive that both nodes can access.  To improve performance, I am attempting to perform all calculations using the local scratch drives. All input files are copied from the gpfs networked drive to the local drive on each node before the initial pw.x calculation.  After the pw.x calculation, a small script copies the output files (pwscf.save folder and pwscf.xml) from the first node to the networked drive and then a second script copies them from the networked drive to the second node before starting the phonon code.
> > 
> > I am open to any suggestions as this solution has been somewhat hacked together after performance using the gpfs networked drives proved incredibly poor.
> > 
> > Thanks,
> > Brad
> > 
> > 
> > --------------------------------------------------------
> > Bradly Baer
> > Graduate Research Assistant, Walker Lab
> > Interdisciplinary Materials Science
> > Vanderbilt University
> > 
> > 
> > _______________________________________________
> > Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> > users mailing list users at lists.quantum-espresso.org
> > https://lists.quantum-espresso.org/mailman/listinfo/users
> 
> 
> -- 
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
> 
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users


More information about the users mailing list