Fw: [Pw_forum] Re: Woodcrest vs Opteron performance in pwscf calc.
hqzhou at nju.edu.cn
Fri Aug 11 04:18:37 CEST 2006
It seems that maintanence work of our campus network messed up our mail
these two days. My reply to the pw forum is in my "Sent" box, but I haven't
its arrival in the mailing list although three days have passed.
Here I forward my post. if you by any means had received the mail 2 days
apologize for the inconvenience to you.
----- Original Message -----
From: "Huiqun Zhou" <hqzhou at nju.edu.cn>
To: <pw_forum at pwscf.org>
Sent: Wednesday, August 09, 2006 7:22 PM
Subject: Re: [Pw_forum] Re: Woodcrest vs Opteron performance in pwscf calc.
> Kostya and list-users,
> Thanks for your comment and recommendation. Considering the
> of machines with dempsey and woodcrest, dempsey may be a good choice too,
> especially if you have no problem to pay the electricity bill ;-)
> You mentioned NUMA enabling on opteron machines, I wonder if it's a
> function of kernel 2.6.9-xx. If it's not, I need to turn it on in
> re-configuration of
> the kernel and recompile, right?
> ----- Original Message -----
> From: "Konstantin Kudin" <konstantin_kudin at yahoo.com>
> To: <pw_forum at pwscf.org>
> Sent: Tuesday, August 08, 2006 12:00 AM
> Subject: Re: [Pw_forum] Re: Woodcrest vs Opteron performance in pwscf
>> Dempsey and Opterons do 2 BLAS operations per cycle, while Woodcrest
>> does 4. So effectively you get these frequencies for BLAS (per core):
>> Woodcrest (4x2.66=10.6), Dempsey (3.2x2=6.4), Opteron ( 2.6x2=5.2).
>> That is exactly the order you get in terms of performance. Your Opteron
>> scaling is not too good, which either suggests that there is not enough
>> memory bandwidth, or you do not have NUMA turned on.
>> Now, the theoretical performance would translate into the real world
>> if the memory is fast enough. I think both Dempsey and Woodcrest use
>> the same chipset with 2 buses, so earlier memory contention issues with
>> multiple Intel chips are mostly gone for now. Still, you see that with
>> 4 Woodcrest cores the speedups are worse then for Dempsey, which
>> suggests that perhaps the optimal purchase for QE would be lower
>> frequency chips, such as 2.0 or 2.33 Ghz since 4 2.66 Ghz cores are too
>> fast for the memory.
More information about the users