[QE-users] Ph.x crashing on multiple nodes

Paolo Giannozzi p.giannozzi at gmail.com
Sat Jan 2 15:02:04 CET 2021


On Sat, Jan 2, 2021 at 1:50 PM Alexandr Fonari <qe at afonari.com> wrote:

This might also happen if "wf_collect = .FALSE." in your scf input (.TRUE.
> is default).
>

it's the default and only value, actually: "wf_collect=.false." is ignored

Paolo

>
> On Sat, Jan 2, 2021, at 9:24 AM, Paolo Giannozzi wrote:
> > It seems to me that your hack should work. Unfortunately the I/O of the
> > phonon code is quite confusing and it is hard to figure out what is not
> > working where and why. Apparently the code tries to read the charge
> > density of the unperturbed system, once again (it is read and stored in
> > memory at the very beginning) and from a different place from the
> > standard one.
> >
> > Paolo
> >
> >
> > On Mon, Dec 28, 2020 at 7:47 PM Baer, Bradly
> > <bradly.b.baer at vanderbilt.edu> wrote:
> > > Hello all,
> > >
> > >  I am experiencing a crash when working with ph.x across multiple
> nodes.  Input and output files are attached.  The first q-point appears to
> be calculated correctly, but the code crashes when attempting to start
> calculating the second q-point. A file "charge-density" is said to be
> missing but "charge-density.dat" exists when I manually inspect the files.
> As there are 16 reports that the file cannot be found, I am assuming that
> this is an issue with me using multiple nodes (each node has 16 cores).  A
> general description of my computing environment and workflow follows:
> > >
> > > I am using SLURM on a cluster.  I have two nodes assigned to my job,
> each with a local scratch drive that is not visible to the other node.  I
> also have access to a gpfs networked drive that both nodes can access.  To
> improve performance, I am attempting to perform all calculations using the
> local scratch drives. All input files are copied from the gpfs networked
> drive to the local drive on each node before the initial pw.x calculation.
> After the pw.x calculation, a small script copies the output files
> (pwscf.save folder and pwscf.xml) from the first node to the networked
> drive and then a second script copies them from the networked drive to the
> second node before starting the phonon code.
> > >
> > > I am open to any suggestions as this solution has been somewhat hacked
> together after performance using the gpfs networked drives proved
> incredibly poor.
> > >
> > > Thanks,
> > > Brad
> > >
> > >
> > > --------------------------------------------------------
> > > Bradly Baer
> > > Graduate Research Assistant, Walker Lab
> > > Interdisciplinary Materials Science
> > > Vanderbilt University
> > >
> > >
> > > _______________________________________________
> > > Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> > > users mailing list users at lists.quantum-espresso.org
> > > https://lists.quantum-espresso.org/mailman/listinfo/users
> >
> >
> > --
> > Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> > Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> > Phone +39-0432-558216, fax +39-0432-558222
> >
> > _______________________________________________
> > Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> > users mailing list users at lists.quantum-espresso.org
> > https://lists.quantum-espresso.org/mailman/listinfo/users
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>


-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210102/d586b5b2/attachment.html>


More information about the users mailing list