[QE-users] MD runs out of memory with increasing number of cores

Fri Jun 18 10:25:03 CEST 2021

Dear Prof. Giannozzi,

Thanks so much for the insight! I realize I might have left out a crucial
piece of information: The oom error does not appear right away, it appears
after a certain number of time steps (and as far as I can tell, somewhat
reproducibly). For the 192 core example I've sent, this number was 4 time
steps. Parallel to these iron calculations I am also doing investigations
on Be, and have found a similar behavior there. I have uploaded the input
files for a Beryllium run in the Google Drive folder of my first message.
For this calculation, I can do ~2700 time steps just fine (which took about
8 hours) and only get the oom error then. Is there some sort of option I am
forgetting to set that leads to some arrays being accumulated and
eventually overflowing?

I understand that just using more and more processors will not necessarily
give me a better performance, but during the performance test I did I found
that by going from 48 processors to 144 I could reduce the average time per
time step from over 1000s to 200s (a plot for this is in the google drive
folder as well). I am aiming for ~30s per time step, since I want to
perform 10000 time steps to get a 10ps trajectory, thus I was trying to
investigate how performance would be affected if I used slightly more
processors. I will try the ntg option. The best performance I was able to
achieve so far was with 144 cores defaulting to -nb 144, so am I correct to
assume that I should try e.g. -nb 144 -ntg 2 for 288 cores?

The 80Ry cutoff was the result of a convergence analysis I did for this
system, although I could maybe decrease this number since I am more
interested in sampling configurations for a Machine Learning application
and less in macroscopic properties derived directly from the MD
calculation.

Kind regards
Lenz

PhD Student (HZDR / CASUS)

Am Mi., 16. Juni 2021 um 07:33 Uhr schrieb Paolo Giannozzi <
p.giannozzi at gmail.com>:

> Hard to say without knowing exactly what goes out of which memory limits.
> Note that not all arrays are distributed across processors, so a
> considerable number of arrays are replicated on all processes. As a
> consequence the total amount of required memory will increase with the
> number of mpi processes. Also note that a 128-atom cell is not "large" and
> 144 cores are not "a small number of processors". You will not get any
> advantage by just increasing the number of processors any more, quite the
> opposite. If you have too many idle cores, you should consider
> - "task group" parallelization (option -ntg)
> - MPI+OpenMP parallelization (configure --enable-openmp)
> Please also note that ecutwfc=80 Ry is a rather large cutoff for a USPP
> (while ecutrho=320 is fine) and that running with K_POINTS Gamma instead of
> 1 1 1 0 0 0 will be faster and take less memory.
>
> Paolo
>
> On Mon, Jun 14, 2021 at 4:22 PM Lenz Fiedler <fiedler.lenz at gmail.com>
> wrote:
>
>> Dear users,
>>
>> I am trying to perform a MD simulation for a large cell (128 Fe atoms,
>> gamma point) using pw.x and I get a strange scaling behavior. To test the
>> performance I ran the same MD simulation with an increasing number of nodes
>> (2, 4, 6, 8, etc.) using 24 cores per node. The simulation is successful
>> when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp (albeit slow,
>> which is within my expectations for such a small number of processors).
>> Going to 8 and more nodes, I run into an out-of-memory error after about
>> two time steps.
>> I am a little bit confused as to what could be the reason. Since a
>> smaller amount of cores works I would not expect a higher number of cores
>> to run without an oom error as well.
>> The 8 node run explictly outputs at the beginning:
>> "     Estimated max dynamical RAM per process >     140.54 MB
>>       Estimated total dynamical RAM >      26.35 GB
>> "
>>
>> which is well within the 2.5 GB I have allocated for each core.
>> I am obviously doing something wrong, could anyone point to what it is?
>> The input files for a 6 and 8 node run can be found here:
>> https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing
>> I am using QE6.6.
>>
>> Kind regards
>> Lenz
>>
>> PhD Student (HZDR / CASUS)
>> _______________________________________________
>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>> users mailing list users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210618/d0c46c3a/attachment.html>