[QE-users] [SUSPECT ATTACHMENT REMOVED] unefficient parallelization of scf calculation
Julien Barbaud
julien_barbaud at sjtu.edu.cn
Wed Apr 10 10:48:33 CEST 2019
Thank you for pointing this out, and my apologies for this technical
problem. I converted them in .txt extension in the hope that it would
fit the taste of our mailers.
I also limited the simulation to 1 iteration as you suggested to get a
proper time report, thank you for this advice.
I have to report that I just noticed that the version of QE on my
computer is 6.0 while the one on the cluster is 6.3, but I would not
expect it to be the main cause of such a big discrepancy (especially
since the problem occurs on the side which has the most updated
version). I will try to update the software on my computer to 6.3 to see
if this changes anything.
Here is my initial question:
I am starting to use a hpc cluster of my university, but I am very green
on parallel computation.
I have made a first test (test #1) on a very small-scale simulation
(relaxation of a GO sheet with 19 atoms, with respect to the gamma
point). The calculation took 3m20s to run on 1 proc on my personal
computer. On the cluster with 4 proc and default parallel options, it
took 1m5s, and on 8 proc it took 44s. This seems like a reasonable
behavior, and at least shows that raising the number of procs does
reduce computation time in this case (with obvious limitations if too
many procs for the job).
However I tried with another test, a bit bigger (test #2). This example
is a scf calculation with 120 atoms (still with respect to the gamma
point). In this case, the parallelization brings absolutely no
improvement. In fact, although the /outfile/ confirms that the code is
running on N procs, it has similar performances as if it was running on
1 proc (sometimes even worse actually, but probably not in a significant
manner, as the times are fluctuating a bit from 1 run to another)
I tried to run this same input file on my personal computer both on 1
and 2 cores. Turns out that it takes 10376s to run 10 iterations on 1
core, while it takes 6777s on two cores, so it seems that the
parallelization is doing ok on my computer.
I have tried to run with different number of cores on the hpc, and
different parallelization options (like for instance –nb 4), but nothing
seems to improve the time
Basically, I am stuck with those 2 seemingly conflicting facts:
* Parallelization seems to have no particular problem on the hpc
cluster because test #1 gives good results
* Parallelization seems to have no particular problem with the
particular input file #2 because it seems to scale reasonably with
proc number on my individual computer
However, combining both and running this file in parallel on the hpc
cluster ends up not working correctly…
I attached the input file and output file of test #2. I also included as
well as the slurm script that I use to submit the calculation to the job
manager, in case it helps (test2.scf.slurm.txt)
Any suggestion on what is going wrong would be very welcome.
Julien
Le 10/04/2019 à 15:25, Paolo Giannozzi a écrit :
> On Wed, Apr 10, 2019 at 9:08 AM JULIEN, CLAUDE, PIERRE BARBAUD
> <julien_barbaud at sjtu.edu.cn <mailto:julien_barbaud at sjtu.edu.cn>> wrote:
>
> I attached the input file and output file of test #2
>
>
> you did, but many mailers don't like that kind of attachments and
> don't let them go through
>
> (this output only ran 1 iteration because I interrupted it).
>
>
> it is useful to stop the code by setting the maximum number of scf
> iterations to 1. In this way one can see the final time report.
>
> Paolo
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190410/98d7074b/attachment.html>
More information about the users
mailing list