[QE-users] MD runs out of memory with increasing number of cores
Lenz Fiedler
fiedler.lenz at gmail.com
Wed Jun 23 13:30:39 CEST 2021
Dear Prof. Giannozzi,
I have also not encountered this behavior for Fe for 36 processors, it
started for me at anything more then 144 processors. But since these
additional processors will probably not give me a performance increase
anyway (and the increase in number of processes is most likely the reason
for the error, if I understand you correctly?) this is not a problem. For
my Beryllium calculation it is more problematic since the 144 processors
case really gives the best performance (I have uploaded a file called
performance_Be128.png to show my timing results), but I still run out of
memory after 2700 time steps. Although this is also manageable, since I can
always restart the calculation and perform another 2700 time steps. With
this I was able to perform 10.000 time steps in just over a day. I am
running more calculations on larger Be and Fe cells and I will investigate
this behavior there.
I have also used the "gamma" option for the K-points to use the performance
benefits you outlined. For the Fe128 cell, I achieved optimal performance
with 144 processors and using the "gamma" option (resulting in about 90s
per SCF cycle). I am still not within my personal target of ~30s per SCF
cycle but I will start looking into the choice of my PSP and cutoff (along
with considering OpenMP and task group parallelization) rather than blindly
throwing more and more processors at the problem.
Kind regards
Lenz
PhD Student (HZDR / CASUS)
Am Sa., 19. Juni 2021 um 09:25 Uhr schrieb Paolo Giannozzi <
p.giannozzi at gmail.com>:
> I tried your Fe job on a 36-core machine (with Gamma point to save time
> and memory) and found no evidence of memory leaks after more than 100 steps.
>
> The best performance I was able to achieve so far was with 144 cores
>> defaulting to -nb 144, so am I correct to assume that I should try e.g. -nb
>> 144 -ntg 2 for 288 cores?
>>
>
> You should not use option -nb except in some rather special cases.
>
> Paolo
>
>
> PhD Student (HZDR / CASUS)
>>
>> Am Mi., 16. Juni 2021 um 07:33 Uhr schrieb Paolo Giannozzi <
>> p.giannozzi at gmail.com>:
>>
>>> Hard to say without knowing exactly what goes out of which memory
>>> limits. Note that not all arrays are distributed across processors, so a
>>> considerable number of arrays are replicated on all processes. As a
>>> consequence the total amount of required memory will increase with the
>>> number of mpi processes. Also note that a 128-atom cell is not "large" and
>>> 144 cores are not "a small number of processors". You will not get any
>>> advantage by just increasing the number of processors any more, quite the
>>> opposite. If you have too many idle cores, you should consider
>>> - "task group" parallelization (option -ntg)
>>> - MPI+OpenMP parallelization (configure --enable-openmp)
>>> Please also note that ecutwfc=80 Ry is a rather large cutoff for a USPP
>>> (while ecutrho=320 is fine) and that running with K_POINTS Gamma instead of
>>> 1 1 1 0 0 0 will be faster and take less memory.
>>>
>>> Paolo
>>>
>>> On Mon, Jun 14, 2021 at 4:22 PM Lenz Fiedler <fiedler.lenz at gmail.com>
>>> wrote:
>>>
>>>> Dear users,
>>>>
>>>> I am trying to perform a MD simulation for a large cell (128 Fe atoms,
>>>> gamma point) using pw.x and I get a strange scaling behavior. To test the
>>>> performance I ran the same MD simulation with an increasing number of nodes
>>>> (2, 4, 6, 8, etc.) using 24 cores per node. The simulation is successful
>>>> when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp (albeit slow,
>>>> which is within my expectations for such a small number of processors).
>>>> Going to 8 and more nodes, I run into an out-of-memory error after
>>>> about two time steps.
>>>> I am a little bit confused as to what could be the reason. Since a
>>>> smaller amount of cores works I would not expect a higher number of cores
>>>> to run without an oom error as well.
>>>> The 8 node run explictly outputs at the beginning:
>>>> " Estimated max dynamical RAM per process > 140.54 MB
>>>> Estimated total dynamical RAM > 26.35 GB
>>>> "
>>>>
>>>> which is well within the 2.5 GB I have allocated for each core.
>>>> I am obviously doing something wrong, could anyone point to what it is?
>>>> The input files for a 6 and 8 node run can be found here:
>>>> https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing
>>>> I am using QE6.6.
>>>>
>>>> Kind regards
>>>> Lenz
>>>>
>>>> PhD Student (HZDR / CASUS)
>>>> _______________________________________________
>>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>>> users mailing list users at lists.quantum-espresso.org
>>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>
>>>
>>>
>>> --
>>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>>> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
>>> Phone +39-0432-558216, fax +39-0432-558222
>>>
>>> _______________________________________________
>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>> users mailing list users at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>> _______________________________________________
>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>> users mailing list users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210623/e370b571/attachment.html>
More information about the users
mailing list