<div dir="ltr"><div>Dear Prof. Giannozzi,</div><div><br></div><div>ah, I understand, that makes sense. Do you have any advice on how to best track such a memory leak down in this case? The behavior is reproducible with my setup. <br></div><div><br></div><div>Kind regards</div><div>Lenz<br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am Mi., 23. Juni 2021 um 14:02 Uhr schrieb Paolo Giannozzi <<a href="mailto:p.giannozzi@gmail.com">p.giannozzi@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Wed, Jun 23, 2021 at 1:31 PM Lenz Fiedler <<a href="mailto:fiedler.lenz@gmail.com" target="_blank">fiedler.lenz@gmail.com</a>> wrote:</div><div dir="ltr"><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div> (and the increase in number of processes is most likely the reason for the error, if I understand you correctly?)</div></div></blockquote><div><br></div><div>not exactly. Too many processes may result in too much global memory usage, because some arrays are replicated on each process.  If you exceed the global available memory, the code will crash. BUT: it will do so during the first MD step, not after 2000 MD steps. The memory usage should not increase with the number of MD time steps. If it does, there is a memory leak, either in the code or somewhere else (libraries etc).<br></div><div><br></div><div>Paolo</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div> this is not a problem. For my Beryllium calculation it is more problematic since the 144 processors case really gives the best performance (I have uploaded a file called performance_Be128.png to show my timing results), but I still run out of memory after 2700 time steps. Although this is also manageable, since I can always restart the calculation and perform another 2700 time steps. With this I was able to perform 10.000 time steps in just over a day. I am running more calculations on larger Be and Fe cells and I will investigate this behavior there. <br></div><div><br></div><div>I have also used the "gamma" option for the K-points to use the performance benefits you outlined. For the Fe128 cell, I achieved optimal performance with 144 processors and using the "gamma" option (resulting in about 90s per SCF cycle). I am still not within my personal target of ~30s per SCF cycle but I will start looking into the choice of my PSP and cutoff (along with considering OpenMP and task group parallelization) rather than blindly throwing more and more processors at the problem. <br></div><div><br></div><div><span>Kind regards<br>Lenz<br><br>PhD Student (HZDR / CASUS)</span></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am Sa., 19. Juni 2021 um 09:25 Uhr schrieb Paolo Giannozzi <<a href="mailto:p.giannozzi@gmail.com" target="_blank">p.giannozzi@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I tried your Fe job on a 36-core machine (with Gamma point to save time and memory) and found no evidence of memory leaks after more than 100 steps.<br></div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">The best performance I was able to achieve so far was with 144 cores defaulting to -nb 144, so am I correct to assume that I should try e.g. -nb 144 -ntg 2 for 288 cores? <br></div></blockquote><div><br></div>You should not use option -nb except in some rather special cases.</div><div class="gmail_quote"><br></div><div class="gmail_quote">Paolo<br></div><div class="gmail_quote"><br></div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">PhD Student (HZDR / CASUS)</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am Mi., 16. Juni 2021 um 07:33 Uhr schrieb Paolo Giannozzi <<a href="mailto:p.giannozzi@gmail.com" target="_blank">p.giannozzi@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hard to say without knowing exactly what goes out of which memory limits. Note that not all arrays are distributed across processors, so a considerable number of arrays are replicated on all processes. As a consequence the total amount of required memory will increase with the number of mpi processes. Also note that a 128-atom cell is not "large" and 144 cores are not "a small number of processors". You will not get any advantage by just increasing the number of processors any more, quite the opposite. If you have too many idle cores, you should consider</div><div>- "task group" parallelization (option -ntg)</div><div>- MPI+OpenMP parallelization (configure --enable-openmp)<br></div><div>Please also note that ecutwfc=80 Ry is a rather large cutoff for a USPP (while ecutrho=320 is fine) and that running with K_POINTS Gamma instead of 1 1 1 0 0 0 will be faster and take less memory.</div><div><br></div><div>Paolo<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jun 14, 2021 at 4:22 PM Lenz Fiedler <<a href="mailto:fiedler.lenz@gmail.com" target="_blank">fiedler.lenz@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Dear users,<br><br>I am trying to perform a MD simulation for a large cell (128 Fe atoms, gamma point) using pw.x and I get a strange scaling behavior. To test the performance I ran the same MD simulation with an increasing number of nodes (2, 4, 6, 8, etc.) using 24 cores per node. The simulation is successful when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp (albeit slow, which is within my expectations for such a small number of processors). <br>Going to 8 and more nodes, I run into an out-of-memory error after about two time steps.<br>I am a little bit confused as to what could be the reason. Since a smaller amount of cores works I would not expect a higher number of cores to run without an oom error as well. <br>The 8 node run explictly outputs at the beginning:<br>"     Estimated max dynamical RAM per process >     140.54 MB<br>      Estimated total dynamical RAM >      26.35 GB<br>"<br><br>which is well within the 2.5 GB I have allocated for each core. <br>I am obviously doing something wrong, could anyone point to what it is?<br>The input files for a 6 and 8 node run can be found here: <a href="https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing" target="_blank">https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing</a><br>I am using QE6.6. <br><br>Kind regards<br>Lenz<br><br>PhD Student (HZDR / CASUS)</div>

_______________________________________________<br>

Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu" rel="noreferrer" target="_blank">www.max-centre.eu</a>)<br>

users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>

<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 206, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div>

_______________________________________________<br>

Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu" rel="noreferrer" target="_blank">www.max-centre.eu</a>)<br>

users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>

<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div>

_______________________________________________<br>

Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu" rel="noreferrer" target="_blank">www.max-centre.eu</a>)<br>

users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>

<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 206, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div></div>

_______________________________________________<br>

Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu" rel="noreferrer" target="_blank">www.max-centre.eu</a>)<br>

users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>

<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div>

_______________________________________________<br>

Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu" rel="noreferrer" target="_blank">www.max-centre.eu</a>)<br>

users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>

<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 206, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div></div>

_______________________________________________<br>

Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu" rel="noreferrer" target="_blank">www.max-centre.eu</a>)<br>

users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>

<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div>