<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="appendonsend"></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
Hello</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
The size of your system is such that parallel execution should be beneficial for up to 50, 60 MPI ranks, without any further option. If you are planning to use something like 200 cores the simplest and yet effective way to go would be to use an executable compiled
with hybrid MPI + OpenMP parallelism and run it with 4 OpenMP threads. <br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
About your test in the 8 core machines, looking at your output, it seems to me that you are using a multithreaded linear algebra library.
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
If it is the case you should make sure that the number of MPI ranks times the number of threads doesn't exceed the number of cores. 8 in your case. To set the number of openMP threads you need to specify the OMP_NUM_THREADS environment variable. e.g.
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<span style="font-family: Mono;">export OMP_NUM_THREAD=2</span> if you run with 4 MPIs
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
or <br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<span style="font-family: Mono;">export OMP_NUM_THREADS=1</span> if you run with 8 MPIs
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)" class="elementToProof">
hope this helps, best regards -- Pietro <br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr" class="elementToProof"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>Da:</b> users <users-bounces@lists.quantum-espresso.org> per conto di Robert Fleming <rofleming@astate.edu><br>
</font></div>
<div dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>Inviato:</b> mercoledì 13 aprile 2022 16:36<br>
<b>A:</b> Quantum ESPRESSO users Forum <users@lists.quantum-espresso.org><br>
<b>Oggetto:</b> [QE-users] Advice for Parallel Execution</font>
<div> </div>
</div>
<div>
<div class="" style="word-wrap:break-word; line-break:after-white-space">Greetings,
<div class=""><br class="">
</div>
<div class="">I’m running scf calculations on an amorphous Si surface terminated with different functional groups (input file attached for context), and I’m experiencing poor scaling behavior in parallel. I’m running these jobs locally on an 8-core CPU to
test before scaling up the system size to an hpc cluster.</div>
<div class=""><br class="">
</div>
<div class="">I’ve noticed that increasing the number of MPI processes from 4 to 8 (mpirun -n 4 pw.x -in [myscript] vs. mpirun -n 8 pw.x -in [myscript]) results in the job taking longer (~ 1hr vs. 1.5 hr). Looking through the documentation, I see that there
are several command line switches for different levels of parallelization beyond just the number of MPI processes. While it’s possible that my system size is potentially too small to benefit from parallel execution (24 atoms, 103 electrons), I think it’s
probably more likely that I’m not appropriately taking advantage of these.</div>
<div class=""><br class="">
</div>
<div class="">Would anyone be willing to share some advice or “rules of thumb” on the best way to select these parallelization levels for small-to-medium sized jobs (say, an 8 core CPU vs. 150-200 processors on an hpc platform)?</div>
<div class=""><br class="">
</div>
<div class="">Thank you,<br class="">
<div class="">
<div dir="auto" class="" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<div dir="auto" class="" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<div dir="auto" class="" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<div>________________________________</div>
<div>Robert “Drew” Fleming, Ph.D.</div>
<div>Assistant Professor of Mechanical Engineering</div>
<div>College of Engineering & Computer Science</div>
<div>Arkansas State University</div>
<div>(870) 972-3743</div>
<div><a href="mailto:rofleming@AState.edu" data-auth="NotApplicable" class="">rofleming@AState.edu</a></div>
<div><br class="">
</div>
<div class=""></div>
</div>
</div>
</div>
</div>
</div>
</div>
<div style="word-wrap:break-word; line-break:after-white-space">
<div>
<div id="x_x_AppleMailSignature">
<div dir="auto" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<div dir="auto" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<div dir="auto" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="" style="word-wrap:break-word; line-break:after-white-space">
<div class="">
<div class="">
<div dir="auto" class="" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<div dir="auto" class="" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<div dir="auto" class="" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<div class=""></div>
</div>
<br class="x_x_Apple-interchange-newline">
</div>
<br class="x_x_Apple-interchange-newline">
</div>
<br class="x_x_Apple-interchange-newline">
<br class="x_x_Apple-interchange-newline">
</div>
<br class="">
</div>
</div>
</div>
</body>
</html>