<div dir="ltr"><div>It seems to me that scaling is quite good up to 16-20 processors for plane-wave parallelization. It is not easy to obtain better results. The effectiveness of k-point parallelization depends a lot on how much k-point-independent parts of the code weigh on the overall performances. In this specific case, k-point parallelization is not as good as it could be. Improving it requires to work on the code.<br><br></div>Paolo<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 18, 2017 at 12:02 PM, balabi <span dir="ltr"><<a href="mailto:balabi@qq.com" target="_blank">balabi@qq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>


<div style="font-family:"\005fae\008f6f\0096c5\009ed1";font-size:14px;color:#000000;line-height:1.7">

    Dear Paolo,<div>    Thank you so much for reply.</div><div>    Sorry for my previous unclear post. <span style="line-height:1.7">I will try to make my statement clear in this post.</span></div><div>    At the end of this post, I attached my <a href="http://scf.in" target="_blank">scf.in</a> file.</div><div>    </div><div>    First, I run scf for different mpi number like this<br></div><div>    mpiexec.hydra -n ${mpinum} pw.x -in <a href="http://scf.in" target="_blank">scf.in</a> > scf.out</div><div><br></div><div>    And then I collected all the timing result in the end of scf.out for different mpi number</div><div><br></div><div><div>1->     PWSCF        :  3m47.62s CPU     3m54.05s WALL</div><div>4->     PWSCF        :    56.51s CPU        57.83s WALL</div><div>8->     PWSCF        :    31.30s CPU        32.78s WALL</div><div>12->     PWSCF        :    24.21s CPU        25.06s WALL</div><div>16->     PWSCF        :    17.67s CPU        18.60s WALL</div><div>20->     PWSCF        :    14.03s CPU        15.26s WALL</div><div>24->     PWSCF        :    13.53s CPU        14.44s WALL</div><div>25->     PWSCF        :    12.13s CPU        14.05s WALL</div><div>28->     PWSCF        :    11.80s CPU        12.69s WALL</div><div>32->     PWSCF        :    13.45s CPU        16.12s WALL</div><div><br></div></div><div>cpu time vs mpi num plot is here : <a href="https://pasteboard.co/GKUXhL4.png" style="line-height:1.7" target="_blank">https://pasteboard.co/<wbr>GKUXhL4.png</a></div><div>then I define, total cpu time = cpu_time x mpi_num, for example, for 32 mpinum result, total cpu time is 32x13.45s=430.4s</div><div>total cpu time vs mpi num plot is here : <a href="https://pasteboard.co/GKUYkD4.png" style="line-height:1.7" target="_blank">https://pasteboard.co/<wbr>GKUYkD4.png</a></div><div>We can see that the scaling is not good. A perfect linear scaling should be a horizontal line, am I right?</div><div><br></div><div>So I thought maybe add k point parallelization will have better scaling. So I tried three case below, since there are 10 kpoints</div><div><br></div><div><span style="line-height:23.8px">mpiexec.hydra -n 30 pw.x -npool 2 -in <a href="http://scf.in" target="_blank">scf.in</a> > scf.out</span></div><div><span style="line-height:23.8px">mpiexec.hydra -n 30 pw.x -npool 5 -in <a href="http://scf.in" target="_blank">scf.in</a> > scf.out</span></div><div><span style="line-height:23.8px">mpiexec.hydra -n 30 pw.x -npool 10 -in <a href="http://scf.in" target="_blank">scf.in</a> > scf.out</span></div><div><span style="line-height:23.8px"><br></span></div><div><span style="line-height:23.8px">The timing result is </span></div><div>-npool 2  -> PWSCF        :    14.89s CPU        15.88s WALL</div><div>-npool 5 -> PWSCF        :    27.45s CPU        28.95s WALL</div><div>-npool 10 -> PWSCF        :  0m53.52s CPU     1m 8.13s WALL</div><div><br></div><div>Clearly, the scaling is extremely worse with npool parallelization. So what is wrong?</div><div><br></div><div>best regards</div><div><br></div><div><span style="line-height:1.7">-----------------</span></div><div>below is <a href="http://scf.in" target="_blank">scf.in</a> file</div><div><br></div><div><div>&CONTROL</div><div>prefix='bi2se3_mpi',</div><div>calculation='scf',</div><div>restart_mode='from_scratch',</div><div>wf_collect=.true.,</div><div>verbosity='high',</div><div>tstress=.true.,</div><div>tprnfor=.true.,</div><div>forc_conv_thr=1d-4,<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">              </span></div><div>outdir='./qe_tmpdir',</div><div>pseudo_dir = './pseudo', </div><div>/</div><div>&SYSTEM</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">   </span>ibrav = 5,</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">        </span>celldm(1)=18.59579532204d0,<wbr>celldm(4)=0.9113725833268d0,</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">        </span>nat = 5,ntyp = 3,</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>ecutwfc = 40,ecutrho = 433,</div><div>/</div><div>&ELECTRONS</div><div><span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">  </span>conv_thr = 1.0d-10,  </div><div>/</div><div>&IONS</div><div>/</div><div>&CELL</div><div>press_conv_thr=0.1d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">                </span></div><div>cell_dofree='all',</div><div>/</div><div>ATOMIC_SPECIES</div><div>Bi <span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>208.98040  <span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">             </span>Bi.pbe-dn-kjpaw_psl.0.2.2.UPF</div><div>Se1<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">  </span>78.971<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">                   </span>Se.pbe-n-kjpaw_psl.0.2.UPF</div><div>Se2<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">     </span>78.971<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">                   </span>Se.pbe-n-kjpaw_psl.0.2.UPF</div><div>ATOMIC_POSITIONS crystal</div><div>Bi<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">       </span>0.4008d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.4008d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.4008d0</div><div>Bi<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">        </span>0.5992d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.5992d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.5992d0</div><div>Se2<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">       </span>0.2117d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.2117d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.2117d0</div><div>Se2<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">       </span>0.7883d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.7883d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap"> </span>0.7883d0</div><div>Se1<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">       </span>0.d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">     </span>0.d0<span class="m_-9068117294776853458Apple-tab-span" style="white-space:pre-wrap">     </span>0.d0</div><div>K_POINTS automatic </div><div>4 4 4 1 1 1</div></div>

</div>

</div>

<br>______________________________<wbr>_________________<br>

Pw_forum mailing list<br>

<a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>

<a href="http://pwscf.org/mailman/listinfo/pw_forum" rel="noreferrer" target="_blank">http://pwscf.org/mailman/<wbr>listinfo/pw_forum</a><br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>Univ. Udine, via delle Scienze 208, 33100 Udine, Italy<br>Phone +39-0432-558216, fax +39-0432-558222<br><br></div></div></div></div></div>

</div>