[Pw_forum] -nimage -npool -ntg -ndiag
Paolo Giannozzi
giannozz at democritos.it
Sun Apr 26 18:36:34 CEST 2009
Hi Eduardo
> mpirun -np 4096 ./pw.x -nimage 8 -npool 2 -ntg 8 -ndiag 144 -input
> myinput.in
presently, -nimage is relevant only for NEB calculations; -npools,
only in presence
of k-points. In the following I am assuming nimage=npool=1.
> I would like to see some hints, in addition to what is reproduced
> below (from the users guide), about the good choices of -ntg and -
> ndiag
>
-ntg is useful if the number of processors you want to use is ~ or >
than nr3s
(fft dimension along axis 3). In that case it may be interesting to
perform FFTs in
parallel on ntg grous of electronic states, parallelizing each FFT on
np/ntg procs.
-ndiag should be such that ndiag^2 <= np . The default is the largest
possible
ndiag; the ideal choice is the one that gives the best performances
and depends
upon many factors (dimension of the matrices to be manipulated,
communication
hardware, phase of the moon...). The larger ndiag, the smaller the
memory usage
(all matrices that are manipulated are distributed across ndiag^2
processors) and
the smaller the amount of floating point operations per processor,
but eventually
the overall performances will flatten and then degrade due to
communication
overhead. Don't even think trying this on slow communication hardware.
Some time ago I was asked to write the proceedings for some Italian
conference
on high-performance computing. As a rule I don't write anything that
doesn't
contain something new or at least something useful. The result of my
and Carlo
Cavazzoni's effort to reduce the waste of bits is the following small
paper:
http://www.fisica.uniud.it/~giannozz/Papers/rimini08.pdf
containing some information on parallelization levels in quantum-
espresso.
Whether it is useful or not, the reader will judge.
Paolo
More information about the users
mailing list