[Pw_forum] FFT & MPI - issues with latency

Mon Jan 30 06:01:32 CET 2006

Dear Kostya, Alex,

this is our datapoint on reasonably good scaling for small
clusters (i.e. assuming that one should leave hardware
for massive parallelism to the experts):
http://nnn.mit.edu/ESPRESSO/CP90_tests/CP_2_1_3_half.pdf

We have a cluster of 32 Dell 1425 3.2 GHz dual-xeons, grouped in 4 nodes 
of 8 dual-xeons. Each node has its own private gigabit ethernet network,
so we can run on up to 16 cpus on each node (each dual-xeon in one node
is connected locally to the other dual-xeons on the same node with a
dedicated 8-port gigabit switch; all dual-xeon are also connected,
with their second gigabit port, to the master via a global master 
switch, that has no MPI traffic on it.)

The results for a medium sized job ("half", 243 bands of AgI
in a 30^ a.u. cell) are in the page above - more details on the job are 
in http://nnn.mit.edu/ESPRESSO/CP90_tests/, in the timings
file (it's the "half" job, i.e. half the electrons of large.j).

As you can see, scaling up to 8 blades works unexpectedly well
(another slightly older cluster of ours saturated at
around 6 blades). Also, the second CPU on a dual-xeon adds very
little, due to the limited memory bandwidth (Opterons would do much
better in scaling from 1 to 2 CPUs on the same motherboard, but
do not do as well in single-CPU performance).

So, if I were to purchase now, I would go with PIVs with the
fastest front side bus and ram (1066 MHz and 667-5300, resepctively),
grouped in small nodes of ~6 CPUs.

				nicola

Axel Kohlmeyer wrote:

> On Sun, 29 Jan 2006, Konstantin Kudin wrote:
> 
> KK>  Hi all,
> 
> hi kostya,
> 
---------------------------------------------------------------------
Prof Nicola Marzari   Department of Materials Science and Engineering
13-5066   MIT   77 Massachusetts Avenue   Cambridge MA 02139-4307 USA
tel 617.4522758 fax 2586534 marzari at mit.edu http://quasiamore.mit.edu