<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content=text/html;charset=gb2312>
<META content="MSHTML 6.00.6000.16705" name=GENERATOR></HEAD>
<BODY id=MailContainerBody
style="PADDING-RIGHT: 10px; PADDING-LEFT: 10px; PADDING-TOP: 15px"
bgColor=#ffffff leftMargin=0 topMargin=0 CanvasTabStop="true"
name="Compose message area">
<DIV><FONT face=Arial size=2>Dear all,</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>I just finished a relax calculation for 120
atoms. After calculation was done, the outputfile reported as
follows,</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2> Program
PWSCF v.4.0.1 starts
...<BR> Today is 16Sep2008 at 19:14:42 </FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2> Parallel version
(MPI)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2> Number of processors in
use: 78<BR> K-points
division: npool
= 3<BR> R & G space
division: proc/pool = 26</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2> For Norm-Conserving or
Ultrasoft (Vanderbilt) Pseudopotentials or PAW</FONT></DIV>
<DIV><FONT face=Arial size=2>................................</FONT></DIV>
<DIV><FONT face=Arial size=2> per-process dynamical
memory: 129.6 Mb</FONT></DIV>
<DIV><FONT face=Arial size=2>................................</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>
PWSCF :
0d 14h46m CPU time,
2d 18h 4m wall time</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2>
init_run : 91.49s
CPU<BR> electrons : 47137.56s CPU
( 27 calls,1745.836 s
avg)<BR> update_pot : 187.80s
CPU ( 26 calls, 7.223 s
avg)<BR> forces
: 4492.20s CPU ( 27 calls, 166.378 s
avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2> Called by
init_run:<BR> wfcinit
: 23.68s CPU<BR>
potinit : 3.15s
CPU</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2> Called by
electrons:<BR> c_bands :
23198.29s CPU ( 258 calls, 89.916 s
avg)<BR> sum_band : 11159.67s
CPU ( 258 calls, 43.255 s
avg)<BR> v_of_rho :
167.39s CPU ( 280 calls, 0.598 s
avg)<BR>
newd : 13679.79s CPU
( 280 calls, 48.856 s
avg)<BR> mix_rho
: 30.14s CPU ( 258 calls,
0.117 s avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2> Called by
c_bands:<BR> init_us_2
: 48.95s CPU ( 517 calls,
0.095 s avg)<BR> cegterg :
23038.94s CPU ( 258 calls, 89.298 s
avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2> Called by
*egterg:<BR>
h_psi : 8629.82s CPU
( 1459 calls, 5.915 s
avg)<BR> s_psi
: 2230.78s CPU ( 1459 calls, 1.529 s
avg)<BR> g_psi
: 34.68s CPU ( 1200 calls, 0.029
s avg)<BR> cdiaghg :
5929.74s CPU ( 1427 calls, 4.155 s
avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2> Called by
h_psi:<BR> add_vuspsi : 2209.17s CPU
( 1459 calls, 1.514 s avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Arial size=2> General
routines<BR> calbec
: 2904.12s CPU ( 1744 calls, 1.665 s
avg)<BR> cft3s
: 4337.89s CPU ( 950068 calls, 0.005 s
avg)<BR> interpolate : 34.87s
CPU ( 538 calls, 0.065 s
avg)<BR> <BR> Parallel
routines<BR> fft_scatter : 538.83s CPU
( 950068 calls, 0.001 s avg)</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>From the reported information, we can see that the
efficiency of my calculation is quite low.</FONT></DIV>
<DIV><FONT face=Arial size=2>I think it maybe not up to snuff.</FONT></DIV>
<DIV><FONT face=Arial size=2>For better understanding with my problem, I'll tell
more about my software, hardware and simulation model.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Simulation model: my system was a slab model for
a certain metal oxide surface with 3 irregular k points.</FONT></DIV>
<DIV><FONT face=Arial size=2>The pseudopotential was Ultrasoft (Vanderbilt)
Pseudopotentials.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Hardware: there are two Xeon single core CPUs
and 2G physical memory for each node. The network </FONT></DIV>
<DIV><FONT face=Arial size=2>is infiniband with 10G band width. </FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Software: My fortran and C compile was intel
10.0.015 version. MKL is l_mkl_p_10.0.3.020.tgz.</FONT></DIV>
<DIV><FONT face=Arial size=2>FFTW is fftw-2.1.5.tar.gz. MPI is
mpich2-1.0.7.tar.gz.</FONT> <FONT face=Arial size=2>All of above was stored
at NSF location.</FONT></DIV>
<DIV><FONT face=Arial size=2>My QE was compiled in a NFS location. and the
outdir was also on the NFS. wfcdir was on local</FONT></DIV>
<DIV><FONT face=Arial size=2>disk, /tmp/ folder. In order to reduce
the IO, I also set the disk_io = 'none'.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Could you tell me what make my CPUs run in a
such low efficiency style? Is there any hints to improve the </FONT></DIV>
<DIV><FONT face=Arial size=2>performance of the parallel
efficiency?</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Do you think 10G infiniband is good enough for 39
nodes? Do you think it's not necessary to put so much file</FONT></DIV>
<DIV><FONT face=Arial size=2>on NFS localtion? Could tell me
which folders must be on a NFS location so that all the nodes can load
and </FONT></DIV>
<DIV><FONT face=Arial size=2>write? </FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>I also noticed that the pw.x reported 129.6 Mb
memory was required. But actually, I found the virtual memory was</FONT></DIV>
<DIV><FONT face=Arial size=2>used. Do you think the pw.x underestimate greatly
for the memory? </FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>thank you for reading.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>any hints on my problem will be deeply
appreciated.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>vega</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial
size=2>=================================================================================<BR>Vega
Lew (weijia liu)<BR>PH.D Candidate in Chemical Engineering<BR>State Key
Laboratory of Materials-oriented Chemical Engineering<BR>College of Chemistry
and Chemical Engineering<BR>Nanjing University of Technology, 210009, Nanjing,
Jiangsu, China</FONT></DIV></BODY></HTML>