[QE-users] MD runs out of memory with increasing number of cores

Wed Jun 16 07:32:40 CEST 2021

Hard to say without knowing exactly what goes out of which memory limits.
Note that not all arrays are distributed across processors, so a
considerable number of arrays are replicated on all processes. As a
consequence the total amount of required memory will increase with the
number of mpi processes. Also note that a 128-atom cell is not "large" and
144 cores are not "a small number of processors". You will not get any
advantage by just increasing the number of processors any more, quite the
opposite. If you have too many idle cores, you should consider
- "task group" parallelization (option -ntg)
- MPI+OpenMP parallelization (configure --enable-openmp)
Please also note that ecutwfc=80 Ry is a rather large cutoff for a USPP
(while ecutrho=320 is fine) and that running with K_POINTS Gamma instead of
1 1 1 0 0 0 will be faster and take less memory.

Paolo

On Mon, Jun 14, 2021 at 4:22 PM Lenz Fiedler <fiedler.lenz at gmail.com> wrote:

> Dear users,
>
> I am trying to perform a MD simulation for a large cell (128 Fe atoms,
> gamma point) using pw.x and I get a strange scaling behavior. To test the
> performance I ran the same MD simulation with an increasing number of nodes
> (2, 4, 6, 8, etc.) using 24 cores per node. The simulation is successful
> when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp (albeit slow,
> which is within my expectations for such a small number of processors).
> Going to 8 and more nodes, I run into an out-of-memory error after about
> two time steps.
> I am a little bit confused as to what could be the reason. Since a smaller
> amount of cores works I would not expect a higher number of cores to run
> without an oom error as well.
> The 8 node run explictly outputs at the beginning:
> "     Estimated max dynamical RAM per process >     140.54 MB
>       Estimated total dynamical RAM >      26.35 GB
> "
>
> which is well within the 2.5 GB I have allocated for each core.
> I am obviously doing something wrong, could anyone point to what it is?
> The input files for a 6 and 8 node run can be found here:
> https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing
> I am using QE6.6.
>
> Kind regards
> Lenz
>
> PhD Student (HZDR / CASUS)
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users

-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210616/a2b84137/attachment.html>