[Pw_forum] timing vs QE version vs pools
Giovanni.Cantele at na.infn.it
Tue Mar 10 16:47:24 CET 2009
>> - it seems that QE 3.2.3 always performs a little bit better than
>> any hint on what (if any) is wrong in what I'm doing?
> assuming that the two versions are compiled with the same options
> and libraries: please check where the difference come from, in the
> cpu time report at the end of each calculation
Let's consider one of the tests, namely 16 cpus, 64 k-points, 1 pool (in
the case of 4.0.4 I used -ndiag 1 to get rid of any effect/difference
coming from the diagonalization parallelism).
These are the relevant (that is, showing the largest differences) results:
3.2.3 time / 4.0.4 time 4.0.4 - 3.2.3 difference
total CPU time: 8m49.79s / 9m26.57s 36.78
init_run 44.22s / 42.89s -1.33
electrons 485.50s / 522.73s 37.23
c_bands 426.96s / 464.64s 37.68
cegterg 422.70s / 458.23s 35.53
h_psi 286.17s / 299.63s 13.46
diaghg 59.16s / 64.07s 4.91
cft3s 240.42s / 245.56s 5.14
fft_scatter 142.73s / 124.84s -17.89
so, it seems that the main difference is just in the
diagonalization-related routines, right?
>> - it seems that -ndiag 1 (serial algorithm for the iterative
>> solution of
>> the eigenvalue problem) always performs a little bit better than the
>> default (code) choice. I attribute this to the fact that only for VERY
>> LARGE number of electrons this may give a difference, is that right?
> VERY LARGE maybe not, but you will gain (or lose) very little unless
> you have let's say several hundreds electronic states
I'll make, asap, more tests, if I do lose little it is ok. In the above
test, turning "off" parallel diag. gave 9m26.57s CPU time,
against 10m51.52s (ortho sub-group = 4* 4 procs), which is 84s
faster, namely ~ 10%. btw, in this case using 3.2.3 gave 8m49.79s (see
above), which is a further 6% gain.
In understand that the gaining with the number of electrons would
increase very fast, but if you can imagin just this test running for one
day, the difference may become relevant.
Can it be due to wrong settings of my cluster?
I compiled both 3.2.3 and 4.0.4 using the same compiler (I never changed
it since the 1st installation of the machine), libraries, etc.
The make.sys is generated in both cases using the configure script of
the corresponding version. The only difference is that in one case
(3.2.3) I turned on the wannier library (-D__WANLIB) and that a "wrong" line
LDFLAGS = static -openmp
overwrites the above one
LDFLAGS = -i-static -openmp
which instead is correctly reported for 4.0.4.
Dr. Giovanni Cantele
Coherentia CNR-INFM and Dipartimento di Scienze Fisiche
Universita' di Napoli "Federico II"
Complesso Universitario di Monte S. Angelo - Ed. 6
Via Cintia, I-80126, Napoli, Italy
Phone: +39 081 676910
Fax: +39 081 676346
E-mail: giovanni.cantele at cnr.it
giovanni.cantele at na.infn.it
Research Group: http://www.nanomat.unina.it
More information about the users