[QE-users] MD runs out of memory with increasing number of cores

Paolo Giannozzi p.giannozzi at gmail.com
Sat Jun 19 09:24:41 CEST 2021


I tried your Fe job on a 36-core machine (with Gamma point to save time and
memory) and found no evidence of memory leaks after more than 100 steps.

The best performance I was able to achieve so far was with 144 cores
> defaulting to -nb 144, so am I correct to assume that I should try e.g. -nb
> 144 -ntg 2 for 288 cores?
>

You should not use option -nb except in some rather special cases.

Paolo


PhD Student (HZDR / CASUS)
>
> Am Mi., 16. Juni 2021 um 07:33 Uhr schrieb Paolo Giannozzi <
> p.giannozzi at gmail.com>:
>
>> Hard to say without knowing exactly what goes out of which memory limits.
>> Note that not all arrays are distributed across processors, so a
>> considerable number of arrays are replicated on all processes. As a
>> consequence the total amount of required memory will increase with the
>> number of mpi processes. Also note that a 128-atom cell is not "large" and
>> 144 cores are not "a small number of processors". You will not get any
>> advantage by just increasing the number of processors any more, quite the
>> opposite. If you have too many idle cores, you should consider
>> - "task group" parallelization (option -ntg)
>> - MPI+OpenMP parallelization (configure --enable-openmp)
>> Please also note that ecutwfc=80 Ry is a rather large cutoff for a USPP
>> (while ecutrho=320 is fine) and that running with K_POINTS Gamma instead of
>> 1 1 1 0 0 0 will be faster and take less memory.
>>
>> Paolo
>>
>> On Mon, Jun 14, 2021 at 4:22 PM Lenz Fiedler <fiedler.lenz at gmail.com>
>> wrote:
>>
>>> Dear users,
>>>
>>> I am trying to perform a MD simulation for a large cell (128 Fe atoms,
>>> gamma point) using pw.x and I get a strange scaling behavior. To test the
>>> performance I ran the same MD simulation with an increasing number of nodes
>>> (2, 4, 6, 8, etc.) using 24 cores per node. The simulation is successful
>>> when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp (albeit slow,
>>> which is within my expectations for such a small number of processors).
>>> Going to 8 and more nodes, I run into an out-of-memory error after about
>>> two time steps.
>>> I am a little bit confused as to what could be the reason. Since a
>>> smaller amount of cores works I would not expect a higher number of cores
>>> to run without an oom error as well.
>>> The 8 node run explictly outputs at the beginning:
>>> "     Estimated max dynamical RAM per process >     140.54 MB
>>>       Estimated total dynamical RAM >      26.35 GB
>>> "
>>>
>>> which is well within the 2.5 GB I have allocated for each core.
>>> I am obviously doing something wrong, could anyone point to what it is?
>>> The input files for a 6 and 8 node run can be found here:
>>> https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing
>>> I am using QE6.6.
>>>
>>> Kind regards
>>> Lenz
>>>
>>> PhD Student (HZDR / CASUS)
>>> _______________________________________________
>>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>>> users mailing list users at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>>
>>
>> --
>> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
>> Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
>> Phone +39-0432-558216, fax +39-0432-558222
>>
>> _______________________________________________
>> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
>> users mailing list users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210619/4841fff6/attachment.html>


More information about the users mailing list