[QE-users] R: Advice for Parallel Execution
Pietro Davide Delugas
pdelugas at sissa.it
Thu Apr 14 13:27:00 CEST 2022
Hello
The size of your system is such that parallel execution should be beneficial for up to 50, 60 MPI ranks, without any further option. If you are planning to use something like 200 cores the simplest and yet effective way to go would be to use an executable compiled with hybrid MPI + OpenMP parallelism and run it with 4 OpenMP threads.
About your test in the 8 core machines, looking at your output, it seems to me that you are using a multithreaded linear algebra library.
If it is the case you should make sure that the number of MPI ranks times the number of threads doesn't exceed the number of cores. 8 in your case. To set the number of openMP threads you need to specify the OMP_NUM_THREADS environment variable. e.g.
export OMP_NUM_THREAD=2 if you run with 4 MPIs
or
export OMP_NUM_THREADS=1 if you run with 8 MPIs
hope this helps, best regards -- Pietro
________________________________
Da: users <users-bounces at lists.quantum-espresso.org> per conto di Robert Fleming <rofleming at astate.edu>
Inviato: mercoledì 13 aprile 2022 16:36
A: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Oggetto: [QE-users] Advice for Parallel Execution
Greetings,
I’m running scf calculations on an amorphous Si surface terminated with different functional groups (input file attached for context), and I’m experiencing poor scaling behavior in parallel. I’m running these jobs locally on an 8-core CPU to test before scaling up the system size to an hpc cluster.
I’ve noticed that increasing the number of MPI processes from 4 to 8 (mpirun -n 4 pw.x -in [myscript] vs. mpirun -n 8 pw.x -in [myscript]) results in the job taking longer (~ 1hr vs. 1.5 hr). Looking through the documentation, I see that there are several command line switches for different levels of parallelization beyond just the number of MPI processes. While it’s possible that my system size is potentially too small to benefit from parallel execution (24 atoms, 103 electrons), I think it’s probably more likely that I’m not appropriately taking advantage of these.
Would anyone be willing to share some advice or “rules of thumb” on the best way to select these parallelization levels for small-to-medium sized jobs (say, an 8 core CPU vs. 150-200 processors on an hpc platform)?
Thank you,
________________________________
Robert “Drew” Fleming, Ph.D.
Assistant Professor of Mechanical Engineering
College of Engineering & Computer Science
Arkansas State University
(870) 972-3743
rofleming at AState.edu<mailto:rofleming at AState.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20220414/0e6045ed/attachment.html>
More information about the users
mailing list