[Pw_forum] abysmal parallel performance of the CP code

Silviu Zilberman silviu at Princeton.EDU
Wed Sep 21 23:19:22 CEST 2005

Hi Kostya,

I am not sure if that's the case but I also noticed similar problems in 
the past. My impression then was that some of the difference is due to 
the fact that the reported wall time includes the time it takes to read 
the initial and write the final restart files to the disk, in contrast 
to the reported CPU time. In some cases, if the cluster network is very 
loaded, it may take several minutes (!) to write big files (hundreds of 
MB). In CP the restart file is not partitioned as in PW, so there is a 
lot of traffic in collecting data from all the nodes and then actually 
writing it to the disk. For long runs you don't see the effect of the 
additional last disk write, but when having only 20 md steps, it may 
become dominant. Were you also writing intermediate restart files during 
the 20 steps of the benchmark?


Konstantin Kudin wrote:

> Hi,
> I've done some parallel benchmarks for the CP code so I thought I'd
>share them with the rest of the group. The system we have is a cluster
>of dual Opterons 2.0 Ghz with 1Gbit ethernet.
> I looked at 2 different measures of time, CPU time, and wall time
>computed as the difference between "This run was started" and "This run
>was terminated". By the way, such wall time could probably be printed
>by the code directly to be readily available.
> The system is a reasonably sized simulation cell with 20 CP
>(electronic+ionic) steps total.
> The compiler is IFC 9.0, GOTO library is for BLAS, and mpich 1.2.6
>used for the MPI. The CP version is the CVS from Aug. 20, 2005.
> What is crazy is that even for 2 cpus sitting in the same box there is
>lots of cpu time just lost somewhere. The strange thing is that the
>quad we have at 2.2 Ghz seems to lose just as much wall time as 2 duals
>talking across the network. And note how 4 cpus are barely better than
>2x compared to single cpu performance if the wall clock time is
> I know Nicola Marzari has done some parallel benchmarks, but I do not
>think that wall times were being paid attention to ...
> Kostya
>P.S. Any suggestions what might be going on here?
>Ncpu	CPU time	Wall time
>1	1h22m		1h24m
>2	45m33.41s	57m13s
>4	27m30.80s	44m21s
>6	18m22.71s	43m18s
>8	14m53.91s	45m56s
>4(quad) 37m18.56s	45m32s
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>Pw_forum mailing list
>Pw_forum at pwscf.org

Zilberman Silviu
213 Frick Laboratory, Department of Chemistry 
Princeton University
Princeton, NJ 08544
phone: 609-258-1834
fax:   609-258-6746
silviu at Princeton.EDU

-------------- next part --------------
A non-text attachment was scrubbed...
Name: silviu.vcf
Type: text/x-vcard
Size: 272 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20050921/9c03c57c/attachment.vcf>

More information about the users mailing list