[QE-users] MD runs out of memory with increasing number of cores

Lenz Fiedler fiedler.lenz at gmail.com
Mon Jun 14 16:21:49 CEST 2021


Dear users,

I am trying to perform a MD simulation for a large cell (128 Fe atoms,
gamma point) using pw.x and I get a strange scaling behavior. To test the
performance I ran the same MD simulation with an increasing number of nodes
(2, 4, 6, 8, etc.) using 24 cores per node. The simulation is successful
when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp (albeit slow,
which is within my expectations for such a small number of processors).
Going to 8 and more nodes, I run into an out-of-memory error after about
two time steps.
I am a little bit confused as to what could be the reason. Since a smaller
amount of cores works I would not expect a higher number of cores to run
without an oom error as well.
The 8 node run explictly outputs at the beginning:
"     Estimated max dynamical RAM per process >     140.54 MB
      Estimated total dynamical RAM >      26.35 GB
"

which is well within the 2.5 GB I have allocated for each core.
I am obviously doing something wrong, could anyone point to what it is?
The input files for a 6 and 8 node run can be found here:
https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing
I am using QE6.6.

Kind regards
Lenz

PhD Student (HZDR / CASUS)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210614/c6761291/attachment.html>


More information about the users mailing list