[QE-users] [SUSPECT ATTACHMENT REMOVED] unefficient parallelization of scf calculation

JULIEN, CLAUDE, PIERRE BARBAUD julien_barbaud at sjtu.edu.cn
Wed Apr 10 09:08:01 CEST 2019


Dear users,

 

I am starting to use a hpc cluster of my university, but I am very green on
parallel computation.

I have made a first test (test #1) on a very small-scale simulation
(relaxation of a GO sheet with 19 atoms, with respect to the gamma point).
The calculation took 3m20s to run on 1 proc on my personal computer. On the
cluster with 4 proc and default parallel options, it took 1m5s, and on 8
proc it took 44s. This seems like a reasonable behavior, and at least shows
that raising the number of procs does reduce computation time in this case
(with obvious limitations if too many procs for the job).

 

 

However I tried with another test, a bit bigger (test #2). This example is a
scf calculation with 120 atoms (still with respect to the gamma point). In
this case, the parallelization brings absolutely no improvement. In fact,
although the outfile confirms that the code is running on N procs, it has
similar performances as if it was running on 1 proc (sometimes even worse
actually, but probably not in a significant manner, as the times are
fluctuating a bit from 1 run to another)

I tried to run this same input file on my personal computer both on 1 and 2
cores. Turns out that it takes 10376s to run 10 iterations on 1 core, while
it takes 6777s on two cores, so it seems that the parallelization is doing
ok on my computer.

I have tried to run with different number of cores on the hpc, and different
parallelization options (like for instance -nb 4), but nothing seems to
improve the time

 

 

Basically, I am stuck with those 2 seemingly conflicting facts:

*         Parallelization seems to have no particular problem on the hpc
cluster because test #1 gives good results

*         Parallelization seems to have no particular problem with the
particular input file #2 because it seems to scale reasonably with proc
number on my individual computer

However, combining both and running this file in parallel on the hpc cluster
ends up not working correctly.

 

I attached the input file and output file of test #2 (this output only ran 1
iteration because I interrupted it). I also included as well as the slurm
script that I use to submit the calculation to the job manager, in case it
helps (bonding.scf.slurm)

 

Any suggestion on what is going wrong would be very welcome.

 

Thanks in advance, 

Julien

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190410/de3d635e/attachment.html>


More information about the users mailing list