[Pw_forum] abysmal parallel performance of the CP code
Silviu Zilberman
silviu at Princeton.EDU
Wed Sep 21 23:19:22 CEST 2005
Hi Kostya,
I am not sure if that's the case but I also noticed similar problems in
the past. My impression then was that some of the difference is due to
the fact that the reported wall time includes the time it takes to read
the initial and write the final restart files to the disk, in contrast
to the reported CPU time. In some cases, if the cluster network is very
loaded, it may take several minutes (!) to write big files (hundreds of
MB). In CP the restart file is not partitioned as in PW, so there is a
lot of traffic in collecting data from all the nodes and then actually
writing it to the disk. For long runs you don't see the effect of the
additional last disk write, but when having only 20 md steps, it may
become dominant. Were you also writing intermediate restart files during
the 20 steps of the benchmark?
Silviu.
Konstantin Kudin wrote:
> Hi,
>
> I've done some parallel benchmarks for the CP code so I thought I'd
>share them with the rest of the group. The system we have is a cluster
>of dual Opterons 2.0 Ghz with 1Gbit ethernet.
>
> I looked at 2 different measures of time, CPU time, and wall time
>computed as the difference between "This run was started" and "This run
>was terminated". By the way, such wall time could probably be printed
>by the code directly to be readily available.
>
> The system is a reasonably sized simulation cell with 20 CP
>(electronic+ionic) steps total.
>
> The compiler is IFC 9.0, GOTO library is for BLAS, and mpich 1.2.6
>used for the MPI. The CP version is the CVS from Aug. 20, 2005.
>
> What is crazy is that even for 2 cpus sitting in the same box there is
>lots of cpu time just lost somewhere. The strange thing is that the
>quad we have at 2.2 Ghz seems to lose just as much wall time as 2 duals
>talking across the network. And note how 4 cpus are barely better than
>2x compared to single cpu performance if the wall clock time is
>considered.
>
> I know Nicola Marzari has done some parallel benchmarks, but I do not
>think that wall times were being paid attention to ...
>
> Kostya
>
>P.S. Any suggestions what might be going on here?
>
>
>Ncpu CPU time Wall time
>1 1h22m 1h24m
>2 45m33.41s 57m13s
>4 27m30.80s 44m21s
>6 18m22.71s 43m18s
>8 14m53.91s 45m56s
>
>4(quad) 37m18.56s 45m32s
>
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam? Yahoo! Mail has the best spam protection around
>http://mail.yahoo.com
>_______________________________________________
>Pw_forum mailing list
>Pw_forum at pwscf.org
>http://www.democritos.it/mailman/listinfo/pw_forum
>
>
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Zilberman Silviu
213 Frick Laboratory, Department of Chemistry
Princeton University
Princeton, NJ 08544
phone: 609-258-1834
fax: 609-258-6746
silviu at Princeton.EDU
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-------------- next part --------------
A non-text attachment was scrubbed...
Name: silviu.vcf
Type: text/x-vcard
Size: 272 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20050921/9c03c57c/attachment.vcf>
More information about the users
mailing list