[Pw_forum] how to improve the calculation speed ?

Wed Sep 23 10:45:51 CEST 2009

wangqj1 wrote:
>
> Dear PWSCF users
> When I use R and G parallelization to run job ,it as if wait for the 
> input .

What does it mean? Does it print the output header or the output up to 
some point or nothing happens?

> According peoples advice ,I use k-point parallelization ,it runs well 
> . But it runs too slow .The information I can offerred as following:
> (1) : CUP usage of one node is as
> Tasks: 143 total, 10 running, 133 sleeping, 0 stopped, 0 zombie
> Cpu0 : 99.7%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu1 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu2 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu3 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu4 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu5 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu6 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu7 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Mem: 8044120k total, 6683720k used, 1360400k free, 1632k buffers
> Swap: 4192956k total, 2096476k used, 2096480k free, 1253712k cached
I'm not very expert about reading such information, but it seams that 
your node is making swap, maybe because the job is requiring too much 
memory with respect to the available one. This usually induces a huge 
performance degradation.

In choosing the optimal number of nodes, processes per node, etc., 
several factors should be taken into account: memory requirements, 
communication hardware, etc. You might want have a look to this page 
from the user guide: http://www.quantum-espresso.org/user_guide/node33.html

Also, consider that, at least for not very very recent CPU generation, 
using too many cores per CPU (e.g. if your cluster configuration is with 
quad-core processors), might not improve (maybe also make worse) the 
code performances (this is also reported in previous threads in this 
forum, you can make a search).

Also this can be of interest for you:
http://www.quantum-espresso.org/wiki/index.php/Frequently_Asked_Questions#Why_is_my_parallel_job_running_in_such_a_lousy_way.3F

> I don't know why it run so slow ,how to solve this problem ? Any 
> advice will be appreciated !
Apart from better suggestions coming from more expert people, it would 
be important to see what kind of job you are trying to run. For example: 
did you start directly with a "production run" (many k-points and/or 
large unit cells and/or large cut-off)? Did pw.x ever run on your 
cluster with simple jobs, like bulk silicon or any other (see the 
examples directory)?

Another possibility would be starting with the serial executable 
(disabling parallel at configure time) and then switch to parallel once 
you check that everything is working OK.

Unfortunately, in many cases the computation requires lot of work to 
correctly set-up and optimize compilation, performances, etc. (not to 
speak about results convergence issues!!!!).
The only way is trying to isolate problems and solve one by one. Yet, I 
would say that in this respect quantum-espresso is one of the best 
choices, being the code made to properly work in as many cases as 
possible, rather then implementing all the human knowledge but just for 
those who wrote it!!!
;-)

Good luck,

Giovanni

-- 

Dr. Giovanni Cantele
Coherentia CNR-INFM and Dipartimento di Scienze Fisiche
Universita' di Napoli "Federico II"
Complesso Universitario di Monte S. Angelo - Ed. 6
Via Cintia, I-80126, Napoli, Italy
Phone: +39 081 676910
Fax:   +39 081 676346
E-mail: giovanni.cantele at cnr.it
        giovanni.cantele at na.infn.it
Web: http://people.na.infn.it/~cantele
Research Group: http://www.nanomat.unina.it
Skype contact: giocan74