[QE-users] [SUSPECT ATTACHMENT REMOVED] unefficient parallelization of scf calculation

Wed Apr 10 10:48:33 CEST 2019

Thank you for pointing this out, and my apologies for this technical 
problem. I converted them in .txt extension in the hope that it would 
fit the taste of our mailers.

I also limited the simulation to 1 iteration as you suggested to get a 
proper time report, thank you for this advice.

I have to report that I just noticed that the version of QE on my 
computer is 6.0 while the one on the cluster is 6.3, but I would not 
expect it to be the main cause of such a big discrepancy (especially 
since the problem occurs on the side which has the most updated 
version). I will try to update the software on my computer to 6.3 to see 
if this changes anything.

Here is my initial question:

I am starting to use a hpc cluster of my university, but I am very green 
on parallel computation.

I have made a first test (test #1) on a very small-scale simulation 
(relaxation of a GO sheet with 19 atoms, with respect to the gamma 
point). The calculation took 3m20s to run on 1 proc on my personal 
computer. On the cluster with 4 proc and default parallel options, it 
took 1m5s, and on 8 proc it took 44s. This seems like a reasonable 
behavior, and at least shows that raising the number of procs does 
reduce computation time in this case (with obvious limitations if too 
many procs for the job).

However I tried with another test, a bit bigger (test #2). This example 
is a scf calculation with 120 atoms (still with respect to the gamma 
point). In this case, the parallelization brings absolutely no 
improvement. In fact, although the /outfile/ confirms that the code is 
running on N procs, it has similar performances as if it was running on 
1 proc (sometimes even worse actually, but probably not in a significant 
manner, as the times are fluctuating a bit from 1 run to another)

I tried to run this same input file on my personal computer both on 1 
and 2 cores. Turns out that it takes 10376s to run 10 iterations on 1 
core, while it takes 6777s on two cores, so it seems that the 
parallelization is doing ok on my computer.

I have tried to run with different number of cores on the hpc, and 
different parallelization options (like for instance –nb 4), but nothing 
seems to improve the time

Basically, I am stuck with those 2 seemingly conflicting facts:

  * Parallelization seems to have no particular problem on the hpc
    cluster because test #1 gives good results
  * Parallelization seems to have no particular problem with the
    particular input file #2 because it seems to scale reasonably with
    proc number on my individual computer

However, combining both and running this file in parallel on the hpc 
cluster ends up not working correctly…

I attached the input file and output file of test #2. I also included as 
well as the slurm script that I use to submit the calculation to the job 
manager, in case it helps (test2.scf.slurm.txt)

Any suggestion on what is going wrong would be very welcome.

Julien

Le 10/04/2019 à 15:25, Paolo Giannozzi a écrit :
> On Wed, Apr 10, 2019 at 9:08 AM JULIEN, CLAUDE, PIERRE BARBAUD 
> <julien_barbaud at sjtu.edu.cn <mailto:julien_barbaud at sjtu.edu.cn>> wrote:
>
>     I attached the input file and output file of test #2
>
>
> you did, but many mailers don't like that kind of attachments and 
> don't let them go through
>
>     (this output only ran 1 iteration because I interrupted it).
>
>
> it is useful to stop the code  by setting the maximum number of scf 
> iterations to 1. In this way one can see the final time report.
>
> Paolo
> -- 
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216, fax +39-0432-558222
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190410/98d7074b/attachment.html>