[Pw_forum] why my pw.x run with low efficiency?

vega vegalew at hotmail.com
Fri Sep 19 18:54:08 CEST 2008


Dear all,

I just finished a relax calculation for 120 atoms. After calculation was done, the outputfile reported as follows,

    Program PWSCF     v.4.0.1  starts ...
     Today is 16Sep2008 at 19:14:42 

     Parallel version (MPI)

     Number of processors in use:      78
     K-points division:     npool     =    3
     R & G space division:  proc/pool =   26

     For Norm-Conserving or Ultrasoft (Vanderbilt) Pseudopotentials or PAW
................................
     per-process dynamical memory:   129.6 Mb
................................

     PWSCF        :     0d   14h46m CPU time,        2d   18h 4m wall time

     init_run     :    91.49s CPU
     electrons    : 47137.56s CPU (      27 calls,1745.836 s avg)
     update_pot   :   187.80s CPU (      26 calls,   7.223 s avg)
     forces       :  4492.20s CPU (      27 calls, 166.378 s avg)

     Called by init_run:
     wfcinit      :    23.68s CPU
     potinit      :     3.15s CPU

     Called by electrons:
     c_bands      : 23198.29s CPU (     258 calls,  89.916 s avg)
     sum_band     : 11159.67s CPU (     258 calls,  43.255 s avg)
     v_of_rho     :   167.39s CPU (     280 calls,   0.598 s avg)
     newd         : 13679.79s CPU (     280 calls,  48.856 s avg)
     mix_rho      :    30.14s CPU (     258 calls,   0.117 s avg)

     Called by c_bands:
     init_us_2    :    48.95s CPU (     517 calls,   0.095 s avg)
     cegterg      : 23038.94s CPU (     258 calls,  89.298 s avg)

     Called by *egterg:
     h_psi        :  8629.82s CPU (    1459 calls,   5.915 s avg)
     s_psi        :  2230.78s CPU (    1459 calls,   1.529 s avg)
     g_psi        :    34.68s CPU (    1200 calls,   0.029 s avg)
     cdiaghg      :  5929.74s CPU (    1427 calls,   4.155 s avg)

     Called by h_psi:
     add_vuspsi   :  2209.17s CPU (    1459 calls,   1.514 s avg)

     General routines
     calbec       :  2904.12s CPU (    1744 calls,   1.665 s avg)
     cft3s        :  4337.89s CPU (  950068 calls,   0.005 s avg)
     interpolate  :    34.87s CPU (     538 calls,   0.065 s avg)
 
     Parallel routines
     fft_scatter  :   538.83s CPU (  950068 calls,   0.001 s avg)

>From the reported information, we can see that the efficiency of my calculation is quite low.
I think it maybe not up to snuff.
For better understanding with my problem, I'll tell more about my software, hardware and simulation model.

Simulation model: my system was a slab model for a certain metal oxide surface with 3 irregular k points.
The pseudopotential was Ultrasoft (Vanderbilt) Pseudopotentials.

Hardware: there are two Xeon single core CPUs and 2G physical memory for each node. The network 
is infiniband with 10G band width. 

Software: My fortran and C compile was intel 10.0.015 version. MKL is l_mkl_p_10.0.3.020.tgz.
FFTW is fftw-2.1.5.tar.gz. MPI is mpich2-1.0.7.tar.gz. All of above was stored at NSF location.
My QE was compiled in a NFS location. and the outdir was also on the NFS. wfcdir was on local
disk, /tmp/ folder. In order to reduce the IO, I also set the disk_io = 'none'.

Could you tell me what make my CPUs run in a such low efficiency style? Is there any hints to improve the 
performance of the parallel efficiency?

Do you think 10G infiniband is good enough for 39 nodes? Do you think it's not necessary to put so much file
on NFS localtion?  Could tell me which folders must be on a NFS location so that all the nodes can load and 
write?  

I also noticed that the pw.x reported 129.6 Mb memory was required. But actually, I found the virtual memory was
used. Do you think the pw.x underestimate greatly for the memory? 

thank you for reading.

any hints on my problem will be deeply appreciated.

vega

=================================================================================
Vega Lew (weijia liu)
PH.D Candidate in Chemical Engineering
State Key Laboratory of Materials-oriented Chemical Engineering
College of Chemistry and Chemical Engineering
Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20080920/32f3f511/attachment.html>


More information about the users mailing list