[Pw_forum] Insufficient Virtual Memory

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Sat Feb 28 19:23:35 CET 2009


On Fri, Feb 27, 2009 at 6:34 PM, Vo, Trinh <trinh.vo at jpl.nasa.gov> wrote:
> Dear Axel,
>
> Thanks for clarification.
>
> About the benchmarks, I just simply to see how well is the performance of
>  the cluster we bought in term of scaling with QE.  I sent some plots to
> you, but the email did not go thru because of the restriction of the size
> (larger than 40K).
>
> Currently, I am not happy at the fact that the difference in CPU time and
> wall time is too large.     When I run a longer job, which took ~2h CPU time
> long, the wall time was ~7h when I run from the head node, and ~4h when I

that probably means you ran a job that was too big in the machine and
thus swapping all the time.

for your reference, here are some numbers from one of our local clusters.
the machine has: 2x Intel Xeon  E5430  @ 2.66GHz and 8GB per node
and a 2xDDR infiniband interconnect.

this first block are runs with four nodes and different -npernode numbers:
h2o-32-4x2.out:     CP           :    18.33s CPU time,   18.79s wall time
h2o-32-4x4.out:     CP           :    16.75s CPU time,   17.50s wall time
h2o-32-4x8.out:     CP           :    25.31s CPU time,   25.94s wall time
h2o-64-4x1.out:     CP           :  2m50.18s CPU time,     3m18.88s wall time
h2o-64-4x2.out:     CP           :  1m29.72s CPU time,     1m33.60s wall time
h2o-64-4x4.out:     CP           :  1m12.42s CPU time,     1m13.70s wall time
h2o-64-4x8.out:     CP           :  1m19.53s CPU time,     1m20.86s wall time

as you can see, same as with cp2k, using 8 cores per node is hurting
performance,
especially for smaller jobs, and using 4 cores per node is a much better choice.

and here the corresponding single node times (run on the frontend):
h2o-32-np1.out:     CP           :  2m24.38s CPU time,     2m39.38s wall time
h2o-32-np2.out:     CP           :  1m24.22s CPU time,     1m42.09s wall time
h2o-32-np4.out:     CP           :    48.92s CPU time,   51.58s wall time
h2o-32-np8.out:     CP           :    41.89s CPU time,   42.72s wall time
h2o-64-np2.out:     CP           :  6m39.17s CPU time,     7m49.54s wall time
h2o-64-np4.out:     CP           :  4m19.69s CPU time,     5m14.73s wall time
h2o-64-np8.out:     CP           :  4m12.16s CPU time,     4m24.57s wall time

the saturation of the memory bandwidth becomes apparent (little gain going
from 4 mpi tasks to 8 mpi tasks). you have to keep in mind on the intel quad
cores, the difference between using 4 cores and 8 cores is especially drastic,
as the cpus share caches between two cores, so with 4 cores i have effectively
double the L2-cache as with 8 cores. it would be interesting to see somebody
do a similar test with AMD quad cores, since those are true quad cores.

you should also note, that those timings contain some non-parallel overhead
that happens when  starting a job. for testing production speed you should
run a 20 step and a 10 step job and then subtract the time for the 10 step
job from the 20 step job to get the timing for 10 steps.

HTH,
   axel.
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
>
>



-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
  Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.



More information about the users mailing list