[Pw_forum] why k point parallelization -npool is so slow?

Wed Sep 20 09:10:54 CEST 2017

It seems to me that scaling is quite good up to 16-20 processors for
plane-wave parallelization. It is not easy to obtain better results. The
effectiveness of k-point parallelization depends a lot on how much
k-point-independent parts of the code weigh on the overall performances. In
this specific case, k-point parallelization is not as good as it could be.
Improving it requires to work on the code.

Paolo

On Mon, Sep 18, 2017 at 12:02 PM, balabi <balabi at qq.com> wrote:

> Dear Paolo,
>     Thank you so much for reply.
>     Sorry for my previous unclear post. I will try to make my statement
> clear in this post.
>     At the end of this post, I attached my scf.in file.
>
>     First, I run scf for different mpi number like this
>     mpiexec.hydra -n ${mpinum} pw.x -in scf.in > scf.out
>
>     And then I collected all the timing result in the end of scf.out for
> different mpi number
>
> 1->     PWSCF        :  3m47.62s CPU     3m54.05s WALL
> 4->     PWSCF        :    56.51s CPU        57.83s WALL
> 8->     PWSCF        :    31.30s CPU        32.78s WALL
> 12->     PWSCF        :    24.21s CPU        25.06s WALL
> 16->     PWSCF        :    17.67s CPU        18.60s WALL
> 20->     PWSCF        :    14.03s CPU        15.26s WALL
> 24->     PWSCF        :    13.53s CPU        14.44s WALL
> 25->     PWSCF        :    12.13s CPU        14.05s WALL
> 28->     PWSCF        :    11.80s CPU        12.69s WALL
> 32->     PWSCF        :    13.45s CPU        16.12s WALL
>
> cpu time vs mpi num plot is here : https://pasteboard.co/GKUXhL4.png
> then I define, total cpu time = cpu_time x mpi_num, for example, for 32
> mpinum result, total cpu time is 32x13.45s=430.4s
> total cpu time vs mpi num plot is here : https://pasteboard.co/GKUYkD4.png
> We can see that the scaling is not good. A perfect linear scaling should
> be a horizontal line, am I right?
>
> So I thought maybe add k point parallelization will have better scaling.
> So I tried three case below, since there are 10 kpoints
>
> mpiexec.hydra -n 30 pw.x -npool 2 -in scf.in > scf.out
> mpiexec.hydra -n 30 pw.x -npool 5 -in scf.in > scf.out
> mpiexec.hydra -n 30 pw.x -npool 10 -in scf.in > scf.out
>
> The timing result is
> -npool 2  -> PWSCF        :    14.89s CPU        15.88s WALL
> -npool 5 -> PWSCF        :    27.45s CPU        28.95s WALL
> -npool 10 -> PWSCF        :  0m53.52s CPU     1m 8.13s WALL
>
> Clearly, the scaling is extremely worse with npool parallelization. So
> what is wrong?
>
> best regards
>
> -----------------
> below is scf.in file
>
> &CONTROL
> prefix='bi2se3_mpi',
> calculation='scf',
> restart_mode='from_scratch',
> wf_collect=.true.,
> verbosity='high',
> tstress=.true.,
> tprnfor=.true.,
> forc_conv_thr=1d-4,
> outdir='./qe_tmpdir',
> pseudo_dir = './pseudo',
> /
> &SYSTEM
> ibrav = 5,
> celldm(1)=18.59579532204d0,celldm(4)=0.9113725833268d0,
> nat = 5,ntyp = 3,
> ecutwfc = 40,ecutrho = 433,
> /
> &ELECTRONS
> conv_thr = 1.0d-10,
> /
> &IONS
> /
> &CELL
> press_conv_thr=0.1d0
> cell_dofree='all',
> /
> ATOMIC_SPECIES
> Bi 208.98040   Bi.pbe-dn-kjpaw_psl.0.2.2.UPF
> Se1 78.971 Se.pbe-n-kjpaw_psl.0.2.UPF
> Se2 78.971 Se.pbe-n-kjpaw_psl.0.2.UPF
> ATOMIC_POSITIONS crystal
> Bi 0.4008d0 0.4008d0 0.4008d0
> Bi 0.5992d0 0.5992d0 0.5992d0
> Se2 0.2117d0 0.2117d0 0.2117d0
> Se2 0.7883d0 0.7883d0 0.7883d0
> Se1 0.d0 0.d0 0.d0
> K_POINTS automatic
> 4 4 4 1 1 1
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>

-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20170920/e6acba9f/attachment.html>