<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'>
</head>
<body>
<style>
font{
line-height: 1.7;
}
</style>
<div style = 'font-family:"微软雅黑"; font-size: 14px; color:#000000; line-height:1.7;'>
<div>
<div>Dear developers,</div><div><br></div><div> I am testing parallelization scaling for a single scf run.</div><div> My server has 32 cores. I found that the scaling is not so good when mpinum is approaching 32. </div><div id="ntes-pcmail-signature" style="font-family:'微软雅黑'"><font style="padding: 0; margin:0;"> </font>
</div> Here is a graph show how total cpu time scales with mpi num : <a href="https://pasteboard.co/GKB7pGK.png" _src="https://pasteboard.co/GKB7pGK.png">https://pasteboard.co/GKB7pGK.png</a></div><div> The best is of cource serial run which takes 234sec.</div><div><br> So I tried to add k point parallelization, wish it could improve scalability. <br> Because there are 10 kpoints according to the scf.out, so I run pw.x as<br> mpiexec.hydra -n 30 pw.x -npool 10 scf.in > scf.out</div><div> But the result is surprising, the running is much much slower. It is now took 68.13sec wall time, that is effectively total 2049sec cpu time<br> Is it expected? Or is there something wrong with my parallel compilation? What is the correct way to improve scaling in my case?<br><br> I notice that "deallocated PAW data for type" is much more than default parallelization.</div><div> Below is the parallel info and other infomation</div><div><br></div><div><div>Parallelization info</div><div> --------------------</div><div> sticks: dense smooth PW G-vecs: dense smooth PW</div><div> Min 1630 603 185 48553 10926 1875</div><div> Max 1632 604 186 48554 10927 1876</div><div> Sum 4893 1811 557 145661 32779 5627</div><div><br></div><div>-----</div><div><div>number of k points= 10</div><div> cart. coord. in units 2pi/alat</div><div> k( 1) = ( 0.0000000 0.0000000 0.1288649), wk = 0.0625000</div><div> k( 2) = ( -0.5938011 -0.3428312 0.2147749), wk = 0.1875000</div><div> k( 3) = ( 1.1876021 0.6856624 -0.0429550), wk = 0.1875000</div><div> k( 4) = ( 0.5938011 0.3428312 0.0429550), wk = 0.1875000</div><div> k( 5) = ( -0.5938011 0.3428312 0.3006849), wk = 0.1875000</div><div> k( 6) = ( 1.1876021 1.3713248 0.0429550), wk = 0.3750000</div><div> k( 7) = ( 0.5938011 1.0284936 0.1288649), wk = 0.3750000</div><div> k( 8) = ( 1.1876021 -0.6856624 -0.2147749), wk = 0.1875000</div><div> k( 9) = ( 0.0000000 0.0000000 0.3865948), wk = 0.0625000</div><div> k( 10) = ( 1.7814032 1.0284936 0.1288649), wk = 0.1875000</div><div><br></div><div> Dense grid: 145661 G-vectors FFT dimensions: ( 125, 125, 125)</div><div><br></div><div> Smooth grid: 32779 G-vectors FFT dimensions: ( 75, 75, 75)</div><div><br></div><div> Estimated max dynamical RAM per process > 232.31 MB</div><div><br></div><div> Estimated total dynamical RAM > 6.81 GB</div></div><div><br></div><div><div>Initial potential from superposition of free atoms</div><div><br></div><div> starting charge 47.99814, renormalised to 48.00000</div><div> Starting wfc are 30 randomized atomic wfcs</div><div> Checking if some PAW data can be deallocated...</div><div> node 0, deallocated PAW data for type: 2</div><div> node 0, deallocated PAW data for type: 3</div><div> node 1, deallocated PAW data for type: 2</div><div> node 1, deallocated PAW data for type: 3</div><div> node 2, deallocated PAW data for type: 2</div><div> node 2, deallocated PAW data for type: 3</div><div> node 3, deallocated PAW data for type: 2</div><div> node 3, deallocated PAW data for type: 3</div><div> node 4, deallocated PAW data for type: 2</div><div> node 4, deallocated PAW data for type: 3</div><div> node 5, deallocated PAW data for type: 2</div><div> node 5, deallocated PAW data for type: 3</div><div> node 6, deallocated PAW data for type: 2</div><div> node 6, deallocated PAW data for type: 3</div><div> node 7, deallocated PAW data for type: 2</div><div> node 7, deallocated PAW data for type: 3</div><div> node 8, deallocated PAW data for type: 2</div><div> node 8, deallocated PAW data for type: 3</div><div> node 9, deallocated PAW data for type: 2</div><div> node 9, deallocated PAW data for type: 3</div><div> node 10, deallocated PAW data for type: 2</div><div> node 10, deallocated PAW data for type: 3</div><div> node 11, deallocated PAW data for type: 2</div><div> node 11, deallocated PAW data for type: 3</div><div> node 12, deallocated PAW data for type: 1</div><div> node 12, deallocated PAW data for type: 2</div><div> node 13, deallocated PAW data for type: 1</div><div> node 13, deallocated PAW data for type: 2</div><div> node 14, deallocated PAW data for type: 1</div><div> node 14, deallocated PAW data for type: 2</div><div> node 15, deallocated PAW data for type: 1</div><div> node 15, deallocated PAW data for type: 2</div><div> node 16, deallocated PAW data for type: 1</div><div> node 16, deallocated PAW data for type: 2</div><div> node 17, deallocated PAW data for type: 1</div><div> node 17, deallocated PAW data for type: 2</div><div> node 18, deallocated PAW data for type: 1</div><div> node 18, deallocated PAW data for type: 2</div><div> node 19, deallocated PAW data for type: 1</div><div> node 19, deallocated PAW data for type: 2</div><div> node 20, deallocated PAW data for type: 1</div><div> node 20, deallocated PAW data for type: 2</div><div> node 21, deallocated PAW data for type: 1</div><div> node 21, deallocated PAW data for type: 2</div><div> node 22, deallocated PAW data for type: 1</div><div> node 22, deallocated PAW data for type: 2</div><div> node 23, deallocated PAW data for type: 1</div><div> node 23, deallocated PAW data for type: 2</div><div> node 24, deallocated PAW data for type: 1</div><div> node 24, deallocated PAW data for type: 3</div><div> node 25, deallocated PAW data for type: 1</div><div> node 25, deallocated PAW data for type: 3</div><div> node 26, deallocated PAW data for type: 1</div><div> node 26, deallocated PAW data for type: 3</div><div> node 27, deallocated PAW data for type: 1</div><div> node 27, deallocated PAW data for type: 3</div><div> node 28, deallocated PAW data for type: 1</div><div> node 28, deallocated PAW data for type: 3</div><div> node 29, deallocated PAW data for type: 1</div><div> node 29, deallocated PAW data for type: 3</div><div><br></div><div> total cpu time spent up to now is 4.2 secs</div></div><div><br></div><div><br></div> </div><style type="text/css">
a#ntes-pcmail-signature-default:hover {
text-decoration: underline;
color: #199cff;
cursor: pointer;
}
a#ntes-pcmail-signature-default:active {
text-decoration: underline;
color: #246fce;
cursor: pointer;
}
</style><!--😀-->
</div>
</body>
</html>