<div dir="ltr"><div>It seems to me that scaling is quite good up to 16-20 processors for plane-wave parallelization. It is not easy to obtain better results. The effectiveness of k-point parallelization depends a lot on how much k-point-independent parts of the code weigh on the overall performances. In this specific case, k-point parallelization is not as good as it could be. Improving it requires to work on the code.<br><br></div>Paolo<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 18, 2017 at 12:02 PM, balabi <span dir="ltr"><<a href="mailto:balabi@qq.com" target="_blank">balabi@qq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div style="font-family:"\005fae\008f6f\0096c5\009ed1";font-size:14px;color:#000000;line-height:1.7">
Dear Paolo,<div> Thank you so much for reply.</div><div> Sorry for my previous unclear post. <span style="line-height:1.7">I will try to make my statement clear in this post.</span></div><div> At the end of this post, I attached my <a href="http://scf.in" target="_blank">scf.in</a> file.</div><div> </div><div> First, I run scf for different mpi number like this<br></div><div> mpiexec.hydra -n ${mpinum} pw.x -in <a href="http://scf.in" target="_blank">scf.in</a> > scf.out</div><div><br></div><div> And then I collected all the timing result in the end of scf.out for different mpi number</div><div><br></div><div><div>1-> PWSCF : 3m47.62s CPU 3m54.05s WALL</div><div>4-> PWSCF : 56.51s CPU 57.83s WALL</div><div>8-> PWSCF : 31.30s CPU 32.78s WALL</div><div>12-> PWSCF : 24.21s CPU 25.06s WALL</div><div>16-> PWSCF : 17.67s CPU 18.60s WALL</div><div>20-> PWSCF : 14.03s CPU 15.26s WALL</div><div>24-> PWSCF : 13.53s CPU 14.44s WALL</div><div>25-> PWSCF : 12.13s CPU 14.05s WALL</div><div>28-> PWSCF : 11.80s CPU 12.69s WALL</div><div>32-> PWSCF : 13.45s CPU 16.12s WALL</div><div><br></div></div><div>cpu time vs mpi num plot is here : <a href="https://pasteboard.co/GKUXhL4.png" style="line-height:1.7" target="_blank">https://pasteboard.co/<wbr>GKUXhL4.png</a></div><div>then I define, total cpu time = cpu_time x mpi_num, for example, for 32 mpinum result, total cpu time is 32x13.45s=430.4s</div><div>total cpu time vs mpi num plot is here : <a href="https://pasteboard.co/GKUYkD4.png" style="line-height:1.7" target="_blank">https://pasteboard.co/<wbr>GKUYkD4.png</a></div><div>We can see that the scaling is not good. A perfect linear scaling should be a horizontal line, am I right?</div><div><br></div><div>So I thought maybe add k point parallelization will have better scaling. So I tried three case below, since there are 10 kpoints</div><div><br></div><div><span style="line-height:23.8px">mpiexec.hydra -n 30 pw.x -npool 2 -in <a href="http://scf.in" target="_blank">scf.in</a> > scf.out</span></div><div><span style="line-height:23.8px">mpiexec.hydra -n 30 pw.x -npool 5 -in <a href="http://scf.in" target="_blank">scf.in</a> > scf.out</span></div><div><span style="line-height:23.8px">mpiexec.hydra -n 30 pw.x -npool 10 -in <a href="http://scf.in" target="_blank">scf.in</a> > scf.out</span></div><div><span style="line-height:23.8px"><br></span></div><div><span style="line-height:23.8px">The timing result is </span></div><div>-npool 2 -> PWSCF : 14.89s CPU 15.88s WALL</div><div>-npool 5 -> PWSCF : 27.45s CPU 28.95s WALL</div><div>-npool 10 -> PWSCF : 0m53.52s CPU 1m 8.13s WALL</div><div><br></div><div>Clearly, the scaling is extremely worse with npool parallelization. So what is wrong?</div><div><br></div><div>best regards</div><div><br></div><div><span style="line-height:1.7">-----------------</span></div><div>below is <a href="http://scf.in" target="_blank">scf.in</a> file</div><div><br></div><div><div>&CONTROL</div><div>prefix='bi2se3_mpi',</div><div>calculation='scf',</div><div>restart_mode='from_scratch',</div><div>wf_collect=.true.,</div><div>verbosity='high',</div><div>tstress=.true.,</div><div>tprnfor=.true.,</div><div>forc_conv_thr=1d-4,<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span></div><div>outdir='./qe_tmpdir',</div><div>pseudo_dir = './pseudo', </div><div>/</div><div>&SYSTEM</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>ibrav = 5,</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>celldm(1)=18.59579532204d0,<wbr>celldm(4)=0.9113725833268d0,</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>nat = 5,ntyp = 3,</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>ecutwfc = 40,ecutrho = 433,</div><div>/</div><div>&ELECTRONS</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>conv_thr = 1.0d-10, </div><div>/</div><div>&IONS</div><div>/</div><div>&CELL</div><div>press_conv_thr=0.1d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span></div><div>cell_dofree='all',</div><div>/</div><div>ATOMIC_SPECIES</div><div>Bi <span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>208.98040 <span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>Bi.pbe-dn-kjpaw_psl.0.2.2.UPF</div><div>Se1<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>78.971<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>Se.pbe-n-kjpaw_psl.0.2.UPF</div><div>Se2<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>78.971<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>Se.pbe-n-kjpaw_psl.0.2.UPF</div><div>ATOMIC_POSITIONS crystal</div><div>Bi<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.4008d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.4008d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.4008d0</div><div>Bi<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.5992d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.5992d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.5992d0</div><div>Se2<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.2117d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.2117d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.2117d0</div><div>Se2<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.7883d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.7883d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.7883d0</div><div>Se1<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.d0</div><div>K_POINTS automatic </div><div>4 4 4 1 1 1</div></div>
</div>
</div>
<br>______________________________<wbr>_________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>
<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/<wbr>listinfo/pw_forum</a><br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div>
</div>