<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'>
</head>
<body>
<style>
font{
line-height: 1.7;
}
</style>
<div style = 'font-family:"微软雅黑"; font-size: 14px; color:#000000; line-height:1.7;'>
Dear Paolo,<div> Thank you so much for reply.</div><div> Sorry for my previous unclear post. <span style="line-height: 1.7;">I will try to make my statement clear in this post.</span></div><div> At the end of this post, I attached my scf.in file.</div><div> </div><div> First, I run scf for different mpi number like this<br></div><div> mpiexec.hydra -n ${mpinum} pw.x -in scf.in > scf.out</div><div><br></div><div> And then I collected all the timing result in the end of scf.out for different mpi number</div><div><br></div><div><div>1-> PWSCF : 3m47.62s CPU 3m54.05s WALL</div><div>4-> PWSCF : 56.51s CPU 57.83s WALL</div><div>8-> PWSCF : 31.30s CPU 32.78s WALL</div><div>12-> PWSCF : 24.21s CPU 25.06s WALL</div><div>16-> PWSCF : 17.67s CPU 18.60s WALL</div><div>20-> PWSCF : 14.03s CPU 15.26s WALL</div><div>24-> PWSCF : 13.53s CPU 14.44s WALL</div><div>25-> PWSCF : 12.13s CPU 14.05s WALL</div><div>28-> PWSCF : 11.80s CPU 12.69s WALL</div><div>32-> PWSCF : 13.45s CPU 16.12s WALL</div><div><br></div></div><div>cpu time vs mpi num plot is here : <a href="https://pasteboard.co/GKUXhL4.png" _src="https://pasteboard.co/GKUXhL4.png" style="line-height: 1.7;">https://pasteboard.co/GKUXhL4.png</a></div><div>then I define, total cpu time = cpu_time x mpi_num, for example, for 32 mpinum result, total cpu time is 32x13.45s=430.4s</div><div>total cpu time vs mpi num plot is here : <a href="https://pasteboard.co/GKUYkD4.png" _src="https://pasteboard.co/GKUYkD4.png" style="line-height: 1.7;">https://pasteboard.co/GKUYkD4.png</a></div><div>We can see that the scaling is not good. A perfect linear scaling should be a horizontal line, am I right?</div><div><br></div><div>So I thought maybe add k point parallelization will have better scaling. So I tried three case below, since there are 10 kpoints</div><div><br></div><div><span style="line-height: 23.8px;">mpiexec.hydra -n 30 pw.x -npool 2 -in scf.in > scf.out</span></div><div><span style="line-height: 23.8px;">mpiexec.hydra -n 30 pw.x -npool 5 -in scf.in > scf.out</span></div><div><span style="line-height: 23.8px;">mpiexec.hydra -n 30 pw.x -npool 10 -in scf.in > scf.out</span></div><div><span style="line-height: 23.8px;"><br></span></div><div><span style="line-height: 23.8px;">The timing result is </span></div><div>-npool 2 -> PWSCF : 14.89s CPU 15.88s WALL</div><div>-npool 5 -> PWSCF : 27.45s CPU 28.95s WALL</div><div>-npool 10 -> PWSCF : 0m53.52s CPU 1m 8.13s WALL</div><div><br></div><div>Clearly, the scaling is extremely worse with npool parallelization. So what is wrong?</div><div><br></div><div>best regards</div><div><br></div><div><span style="line-height: 1.7;">-----------------</span></div><div>below is scf.in file</div><div><br></div><div><div>&CONTROL</div><div>prefix='bi2se3_mpi',</div><div>calculation='scf',</div><div>restart_mode='from_scratch',</div><div>wf_collect=.true.,</div><div>verbosity='high',</div><div>tstress=.true.,</div><div>tprnfor=.true.,</div><div>forc_conv_thr=1d-4,<span class="Apple-tab-span" style="white-space:pre"> </span></div><div>outdir='./qe_tmpdir',</div><div>pseudo_dir = './pseudo', </div><div>/</div><div>&SYSTEM</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>ibrav = 5,</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>celldm(1)=18.59579532204d0,celldm(4)=0.9113725833268d0,</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>nat = 5,ntyp = 3,</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>ecutwfc = 40,ecutrho = 433,</div><div>/</div><div>&ELECTRONS</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>conv_thr = 1.0d-10, </div><div>/</div><div>&IONS</div><div>/</div><div>&CELL</div><div>press_conv_thr=0.1d0<span class="Apple-tab-span" style="white-space:pre"> </span></div><div>cell_dofree='all',</div><div>/</div><div>ATOMIC_SPECIES</div><div>Bi <span class="Apple-tab-span" style="white-space:pre"> </span>208.98040 <span class="Apple-tab-span" style="white-space:pre"> </span>Bi.pbe-dn-kjpaw_psl.0.2.2.UPF</div><div>Se1<span class="Apple-tab-span" style="white-space:pre"> </span>78.971<span class="Apple-tab-span" style="white-space:pre"> </span>Se.pbe-n-kjpaw_psl.0.2.UPF</div><div>Se2<span class="Apple-tab-span" style="white-space:pre"> </span>78.971<span class="Apple-tab-span" style="white-space:pre"> </span>Se.pbe-n-kjpaw_psl.0.2.UPF</div><div>ATOMIC_POSITIONS crystal</div><div>Bi<span class="Apple-tab-span" style="white-space:pre"> </span>0.4008d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.4008d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.4008d0</div><div>Bi<span class="Apple-tab-span" style="white-space:pre"> </span>0.5992d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.5992d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.5992d0</div><div>Se2<span class="Apple-tab-span" style="white-space:pre"> </span>0.2117d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.2117d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.2117d0</div><div>Se2<span class="Apple-tab-span" style="white-space:pre"> </span>0.7883d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.7883d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.7883d0</div><div>Se1<span class="Apple-tab-span" style="white-space:pre"> </span>0.d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.d0<span class="Apple-tab-span" style="white-space:pre"> </span>0.d0</div><div>K_POINTS automatic </div><div>4 4 4 1 1 1</div></div><!--😀-->
</div>
</body>
</html>