[Pw_forum] Nonlinear scaling with pool parallelization

Tue Apr 5 21:54:02 CEST 2011

On Apr 5, 2011, at 19:54 , Markus Meinert wrote:

> I used an _unshifted_ k-mesh

it doesn't matter if it is shifted or unshifted: only the number of k- 
points
matters for k-point parallelization.

> The slab has 20 k points.

20 k-points on 3 processors = 7+7+6: load balancing is not ideal.
This is likely to be a minor factor, though.

> But, since a single iteration takes about 100 seconds, I do not
> see where the time is being spent, when the k points are independent.

you do not see because you do not know where to look. Not that it
is explained somewhere...have a look into the final report:
* the time spent in "c_bands" and called routines is proportional to the
   number of k-points, so it will scale linearly with the number of  
"k-point pools"
* the time spent in "sum_band" is only in part proportional to the  
number
   of k-points and will partially scale
* the time spent in "v_of_rho", "newd", "mix_rho", is independent  
upon the
   number of of k-points and will not scale at all
* k-point parallelization does not reduce memory
* The rest is usually irrelevant
Also note that
* FFT parallelization distributes most memory
* FFT parallelization speeds up (with varying efficiency) almost all  
routines,
   with the exception of "cdiaghg" or "rdiaghg"
* linear-algebra parallelization (that you are not using) will (not  
always) speed
   up   "cdiaghg" or "rdiaghg" and distribute more memory
Alles klar?

P.
---
Paolo Giannozzi, Dept of Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222