<div dir="ltr">Dear Prof. Giannozzi, <br><br>Thanks so much for the insight! I realize I might have left out a crucial piece of information: The oom error does not appear right away, it appears after a certain number of time steps (and as far as I can tell, somewhat reproducibly). For the 192 core example I've sent, this number was 4 time steps. Parallel to these iron calculations I am also doing investigations on Be, and have found a similar behavior there. I have uploaded the input files for a Beryllium run in the Google Drive folder of my first message. For this calculation, I can do ~2700 time steps just fine (which took about 8 hours) and only get the oom error then. Is there some sort of option I am forgetting to set that leads to some arrays being accumulated and eventually overflowing? <br><br>I understand that just using more and more processors will not necessarily give me a better performance, but during the performance test I did I found that by going from 48 processors to 144 I could reduce the average time per time step from over 1000s to 200s (a plot for this is in the google drive folder as well). I am aiming for ~30s per time step, since I want to perform 10000 time steps to get a 10ps trajectory, thus I was trying to investigate how performance would be affected if I used slightly more processors. I will try the ntg option. The best performance I was able to achieve so far was with 144 cores defaulting to -nb 144, so am I correct to assume that I should try e.g. -nb 144 -ntg 2 for 288 cores? <br><br>The 80Ry cutoff was the result of a convergence analysis I did for this system, although I could maybe decrease this number since I am more interested in sampling configurations for a Machine Learning application and less in macroscopic properties derived directly from the MD calculation. <br><br>Kind regards<br>Lenz<br><br>PhD Student (HZDR / CASUS)</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am Mi., 16. Juni 2021 um 07:33 Uhr schrieb Paolo Giannozzi <<a href="mailto:p.giannozzi@gmail.com">p.giannozzi@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hard to say without knowing exactly what goes out of which memory limits. Note that not all arrays are distributed across processors, so a considerable number of arrays are replicated on all processes. As a consequence the total amount of required memory will increase with the number of mpi processes. Also note that a 128-atom cell is not "large" and 144 cores are not "a small number of processors". You will not get any advantage by just increasing the number of processors any more, quite the opposite. If you have too many idle cores, you should consider</div><div>- "task group" parallelization (option -ntg)</div><div>- MPI+OpenMP parallelization (configure --enable-openmp)<br></div><div>Please also note that ecutwfc=80 Ry is a rather large cutoff for a USPP (while ecutrho=320 is fine) and that running with K_POINTS Gamma instead of 1 1 1 0 0 0 will be faster and take less memory.</div><div><br></div><div>Paolo<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jun 14, 2021 at 4:22 PM Lenz Fiedler <<a href="mailto:fiedler.lenz@gmail.com" target="_blank">fiedler.lenz@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Dear users,<br><br>I am trying to perform a MD simulation for a large cell (128 Fe atoms, gamma point) using pw.x and I get a strange scaling behavior. To test the performance I ran the same MD simulation with an increasing number of nodes (2, 4, 6, 8, etc.) using 24 cores per node. The simulation is successful when using 2, 4, and 6 nodes, so 48, 96 and 144 cores resp (albeit slow, which is within my expectations for such a small number of processors). <br>Going to 8 and more nodes, I run into an out-of-memory error after about two time steps.<br>I am a little bit confused as to what could be the reason. Since a smaller amount of cores works I would not expect a higher number of cores to run without an oom error as well. <br>The 8 node run explictly outputs at the beginning:<br>" Estimated max dynamical RAM per process > 140.54 MB<br> Estimated total dynamical RAM > 26.35 GB<br>"<br><br>which is well within the 2.5 GB I have allocated for each core. <br>I am obviously doing something wrong, could anyone point to what it is?<br>The input files for a 6 and 8 node run can be found here: <a href="https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing" target="_blank">https://drive.google.com/drive/folders/1kro3ooa2OngvddB8RL-6Iyvdc07xADNJ?usp=sharing</a><br>I am using QE6.6. <br><br>Kind regards<br>Lenz<br><br>PhD Student (HZDR / CASUS)</div>
_______________________________________________<br>
Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu" rel="noreferrer" target="_blank">www.max-centre.eu</a>)<br>
users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>
<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 206, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div>
_______________________________________________<br>
Quantum ESPRESSO is supported by MaX (<a href="http://www.max-centre.eu" rel="noreferrer" target="_blank">www.max-centre.eu</a>)<br>
users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>
<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div>