[Pw_forum] FFT & MPI - issues with latency
Nicola Marzari
marzari at MIT.EDU
Mon Jan 30 06:01:32 CET 2006
Dear Kostya, Alex,
this is our datapoint on reasonably good scaling for small
clusters (i.e. assuming that one should leave hardware
for massive parallelism to the experts):
http://nnn.mit.edu/ESPRESSO/CP90_tests/CP_2_1_3_half.pdf
We have a cluster of 32 Dell 1425 3.2 GHz dual-xeons, grouped in 4 nodes
of 8 dual-xeons. Each node has its own private gigabit ethernet network,
so we can run on up to 16 cpus on each node (each dual-xeon in one node
is connected locally to the other dual-xeons on the same node with a
dedicated 8-port gigabit switch; all dual-xeon are also connected,
with their second gigabit port, to the master via a global master
switch, that has no MPI traffic on it.)
The results for a medium sized job ("half", 243 bands of AgI
in a 30^ a.u. cell) are in the page above - more details on the job are
in http://nnn.mit.edu/ESPRESSO/CP90_tests/, in the timings
file (it's the "half" job, i.e. half the electrons of large.j).
As you can see, scaling up to 8 blades works unexpectedly well
(another slightly older cluster of ours saturated at
around 6 blades). Also, the second CPU on a dual-xeon adds very
little, due to the limited memory bandwidth (Opterons would do much
better in scaling from 1 to 2 CPUs on the same motherboard, but
do not do as well in single-CPU performance).
So, if I were to purchase now, I would go with PIVs with the
fastest front side bus and ram (1066 MHz and 667-5300, resepctively),
grouped in small nodes of ~6 CPUs.
nicola
Axel Kohlmeyer wrote:
> On Sun, 29 Jan 2006, Konstantin Kudin wrote:
>
> KK> Hi all,
>
> hi kostya,
>
---------------------------------------------------------------------
Prof Nicola Marzari Department of Materials Science and Engineering
13-5066 MIT 77 Massachusetts Avenue Cambridge MA 02139-4307 USA
tel 617.4522758 fax 2586534 marzari at mit.edu http://quasiamore.mit.edu
More information about the users
mailing list