[QE-users] Running efficiently on multiple nodes

Wed Nov 4 23:28:07 CET 2020

Hello,

I am hoping to get some advice on how to efficiently run QE on multiple nodes in a cluster.  I have a simulation that I have been running on 1node/16cores that I am looking to scale up to 2nodes/32cores.  Our cluster has a gpfs networked filesystem that has previously caused performance issues due to QE's large writes.  The way I solved this while running on one node was to move the input files to the node's local hdd/ssd, run the simulation, then move the results back to the networked file system.  Now that I have two nodes, the script for a single node results in a crash shortly after reading in the pseudopotentials.

I have been able to finish a test simulation using only the network storage, so I believe that QE is configured properly to run across multiple nodes.  I did observe a significant performance hit using the networked storage though, going from 1 node/local drive taking ~1 min to 2 nodes/networked storage taking ~7 minutes.

I would like to be able to return to using the local drives on each node to avoid these issues with networked storage.  Is there some sort of setting inside QE that can help me with this or is this something that I need to work with my cluster admin team to resolve?

Additional info:  Our cluster uses SLURM for job submission and I am currently using pw.x and ph.x from QE

Thanks,
Brad

--------------------------------------------------------
Bradly Baer
Graduate Research Assistant, Walker Lab
Interdisciplinary Materials Science
Vanderbilt University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20201104/a181735a/attachment.html>