Dear Paolo, Axel, and every one,<br><br>Regarding the timings of 4.1 and 4.0.4, I sent yesterday the outputs produced by both codes. <br>
Due to the size of the attachements, the message is awaiting for
moderation. I am not sure if I can send attachments to the forum :-(. In
any case, find below a summary of the differences of the outputs.I see
4.1 mades 8 SCF iterations, while, 4.0.4 made 9, but in shorter time.
Here is a collection of most noticeable differences. I hope it helps <br>
tuning.<br>
<br>
< Program PWSCF v.4.0.4 starts ...<br>
< Today is 25Jul2009 at 11:19:38 <br>
---<br>
> Program PWSCF v.4.1 starts ...<br>
> Today is 26Jul2009 at 12:15:35 <br>
16,26c16<br>
< Found additional translation: 0.0000 0.0000 0.1000<br>
< Found additional translation: 0.0000 0.0000 0.2000<br>
< Found additional translation: 0.0000 0.0000 0.3000<br>
< Found additional translation: 0.0000 0.0000 0.4000<br>
< Found additional translation: 0.0000 0.0000 -0.5000<br>
< Found additional translation: 0.0000 0.0000 -0.4000<br>
< Found additional translation: 0.0000 0.0000 -0.3000<br>
< Found additional translation: 0.0000 0.0000 -0.2000<br>
< Found additional translation: 0.0000 0.0000 -0.1000<br>
< <br>
< Iterative solution of the eigenvalue problem<br>
---<br>
> Waiting for input...<br>
27a18<br>
> Subspace diagonalization in iterative solution of the eigenvalue problem:<br>
31a23,24<br>
> Found symmetry operation: I + ( 0.0000 0.0000 0.1000)<br>
> This is a supercell, fractional translation are disabled<br>
<br>
<br>
1389c1370<br>
< convergence has been achieved in 9 iterations<br>
---<br>
> convergence has been achieved in 8 iterations<br>
1393c1374<br>
< PWSCF : 30m58.57s CPU time, 36m 1.09s wall time<br>
---<br>
> PWSCF : 34m36.46s CPU time, 39m42.87s wall time<br>
1395,1396c1376,1377<br>
< init_run : 37.43s CPU<br>
< electrons : 1818.18s CPU<br>
---<br>
> init_run : 38.88s CPU<br>
> electrons : 2034.45s CPU<br>
1399,1401c1380,1382<br>
< wfcinit : 36.15s CPU<br>
< potinit : 0.36s CPU<br>
< realus : 0.20s CPU<br>
---<br>
> wfcinit : 37.84s CPU<br>
> potinit : 0.37s CPU<br>
> realus : 0.18s CPU<br>
1404,1408c1385,1389<br>
< c_bands : 1690.30s CPU ( 10 calls, 169.030 s avg)<br>
< sum_band : 126.80s CPU ( 10 calls, 12.680 s avg)<br>
< v_of_rho : 0.35s CPU ( 10 calls, 0.035 s avg)<br>
< newd : 0.07s CPU ( 10 calls, 0.007 s avg)<br>
< mix_rho : 0.06s CPU ( 10 calls, 0.006 s avg)<br>
---<br>
> c_bands : 1920.64s CPU ( 9 calls, 213.404 s avg)<br>
> sum_band : 112.93s CPU ( 9 calls, 12.547 s avg)<br>
> v_of_rho : 0.30s CPU ( 9 calls, 0.033 s avg)<br>
> newd : 0.06s CPU ( 9 calls, 0.007 s avg)<br>
> mix_rho : 0.04s CPU ( 9 calls, 0.005 s avg)<br>
1411,1412c1392,1393<br>
< init_us_2 : 3.62s CPU ( 714 calls, 0.005 s avg)<br>
< cegterg : 1677.69s CPU ( 340 calls, 4.934 s avg)<br>
---<br>
> init_us_2 : 3.22s CPU ( 646 calls, 0.005 s avg)<br>
> cegterg : 1909.07s CPU ( 306 calls, 6.239 s avg)<br>
415,1418c1396,1399<br>
< h_psi : 736.21s CPU ( 1487 calls, 0.495 s avg)<br>
< s_psi : 79.33s CPU ( 1487 calls, 0.053 s avg)<br>
< g_psi : 2.70s CPU ( 1113 calls, 0.002 s avg)<br>
< cdiaghg : 605.99s CPU ( 1419 calls, 0.427 s avg)<br>
---<br>
> h_psi : 695.24s CPU ( 1661 calls, 0.419 s avg)<br>
> s_psi : 72.59s CPU ( 1661 calls, 0.044 s avg)<br>
> g_psi : 2.17s CPU ( 1321 calls, 0.002 s avg)<br>
> cdiaghg : 893.34s CPU ( 1593 calls, 0.561 s avg)<br>
1421c1402<br>
< add_vuspsi : 78.47s CPU ( 1487 calls, 0.053 s avg)<br>
---<br>
> add_vuspsi : 73.58s CPU ( 1661 calls, 0.044 s avg)<br>
1424,1427c1405,1408<br>
< calbec : 126.54s CPU ( 1827 calls, 0.069 s avg)<br>
< cft3s : 633.97s CPU ( 505734 calls, 0.001 s avg)<br>
< interpolate : 0.09s CPU ( 20 calls, 0.005 s avg)<br>
< davcio : 0.02s CPU ( 1054 calls, 0.000 s avg)<br>
---<br>
> calbec : 127.38s CPU ( 1967 calls, 0.065 s avg)<br>
> cft3s : 583.31s CPU ( 468124 calls, 0.001 s avg)<br>
> interpolate : 0.08s CPU ( 18 calls, 0.005 s avg)<br>
> davcio : 0.02s CPU ( 952 calls, 0.000 s avg)<br>
430c1411<br>
< fft_scatter : 411.50s CPU ( 505734 calls, 0.001 s avg)<br>
---<br>
> fft_scatter : 396.13s CPU ( 468124 calls, 0.001 s avg)<br>
<br>
<br>
Note that I repeated the tests and the timings changes less than 10 seconds in 35 min.
<div class="im"><br></div>
Best regards,<br><br clear="all"><br>-- <br>Eduardo Menendez<br>Departamento de Fisica<br>Facultad de Ciencias<br>Universidad de Chile<br>Phone: (56)(2)9787439<br>URL: <a href="http://fisica.ciencias.uchile.cl/~emenendez">http://fisica.ciencias.uchile.cl/~emenendez</a><br>