[Pw_forum] scaling on clusters with different communication types

Sat Dec 16 19:07:40 CET 2006

Dear Kader,

we usually test cp, rather than pwscf, and have only used gigabit 
ethernet. Usually, our clusters are divided in nodes of 4 or 6 or 8
blades, where each blade is either a Pentium 4, a Dual Xeon, or a
Core 2 Duo, and each node has its own private gigabit switch (i.e.
the master node connects to all blades with a master switch, and each
node of 4/6/8 blades has its own second independent gigabit switch).

Intel processors work very well as single CPU performance, but
saturate the memory bus very rapidly. For this reason, in our
applications a dual Xeon is usually only 25% faster than a single
Pentium, or a dual chip / quad core Woodcrest is only ~30% faster than a
single chip / dual core at the same clock speed.

One important statement: The quality of the gigabit switch and the
quality of the gigabit controller on the motherboard have an enormous
influence on the results.

You can find in http://quasiamore.mit.edu/ESPRESSO/CP90_tests/
a summary of our standard test - bulk AgI in a cubic cell of
30x30x30 a.u. with 108 atoms, 20 Ry cutoff (160 Ry cutoff on
the charge density), and 486 bands (i.e. 972 electrons).

Note that the cp code has changed during the years, become slightly
slower at first, and faster lately.

If you read this, you might get an idea of what to expect -
our key numbers are for how long it takes, in seconds, to
do 5 CP steps on this system. The best performance that we get,
at this stage, is 23.4s per CP step, on 6 Core 2 Duo:
http://quasiamore.mit.edu/ESPRESSO/CP90_tests/CP90.timings.large

Let me summarize the most important points:

1) the quality of the gigabit switches makes an important difference.
None of the 6-7 currently available that we tested performed as well as 
a much older one we purchased 2 years ago. We tested expensive managed
and unmanaged ones. No idea why this is.

2) the quality of the on-board ethernet controller is extremely
important. Our dual xeon cluster, with Dell SC1425, scales much better
(up to 8 blades) than our latest Core 2 Duo (with PDSMI+ motherboards, 
and specifically Intel 82573L/V ethernet controllers), that at most 
scales on 3 blades. This is on rerasonably comparable workload, and we 
cannot think at any other variable than the controller.

3) we have been experimenting with a new MPI protocol (MPI-GAMMA) from
Giuseppe Ciaccio in Genova, and this seems to be working very well, with
excellent scaling up to 6 Core 2 Duo blades. This is still experimental,
and we plan to release more complete info in January.

Thanks to Nicola Bonini, Arash Mostofi, and Young-Su Lee for all the 
careful tests listed above.

			nicola

Kara, Abdelkader wrote:
> Dear all,
> 
> Greatings.
> 
> I will appreciate it very much if you can share with me your experience
> of running pwscf on clusters with different communication hardware.
> I am interested in the scaling with the number of CPU's for the following 3
> different communication types:
> 1)gigabit ethernet
> 2) myrinet
> 3) InfiniBand
> 
> Thank you very much for your input on this matter
> 
> Kader Kara
> 
> Physics Department
> University of Central Florida
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum

-- 
---------------------------------------------------------------------
Prof Nicola Marzari   Department of Materials Science and Engineering
13-5066   MIT   77 Massachusetts Avenue   Cambridge MA 02139-4307 USA
tel 617.4522758 fax 2586534 marzari at mit.edu http://quasiamore.mit.edu