[Pw_forum] QE on Xeon Phi

Axel Kohlmeyer akohlmey at gmail.com
Mon Jul 14 16:46:48 CEST 2014


On Mon, Jul 14, 2014 at 9:34 AM, Eduardo Menendez <eariel99 at gmail.com> wrote:
> Thank you Axel. Your advise rises another doubt. Can we get the maximum
> performance from a highly clocked CPU?
> I used to consider that the the fastest CPUs were too fast for the memory
> access, resulting in bottlenecks. Of couse it depends on cache size.

your concern is justified, but the situation is more complex these
days. highly clocked CPUs have less cores and thus receive a larger
share of the available memory bandwidth and the highest clocked
inter-CPU and memory bus is only available for a subset of the CPUs.
now you have an optimization problem that has to consider the strong
scaling (or lack thereof) of the code in question as an additional
input parameter.

to give an example: we purchased at the same time dual socket nodes
that had the same mainboard, but either 2x 3.5GHz quad-core or 2x
2.8GHz hex-core. the 3.5GHz was the fastest clock available at the
time. for classical MD, i get better performance out of the 12-core
nodes, for plane-wave DFT i get about the same performance out of
both, for CP2k i get better performance with the 8-core (in fact, CP2k
runs fastest on the 12-core with using only 8 cores). now, the cost of
the 2.8GHz CPUs is significantly lower, so that is why we procured the
majority of the cluster with those. but we do have applications that
scale less than CP2k or are serial, but require high per-core memory
bandwidth, so we got a few of the 3.5GHz ones, too (and since they are
already expensive we filled them with RAM as much as it doesn't result
in underclocking of the memory bus; and in turn we put "only" 1GB/core
into the 12-core nodes).

so it all boils down to finding the right balance and adjusting it to
the application mix that you are running. last time i checked the
intel spec sheets, it looked as if the best deal was to be had for
CPUs with the second largest number of CPU cores and as high a clock
as required to have the full memory bus speed. that will also keep the
heat in check, as the highest clocked CPUs usually have a much higher
TDP (>50% more) and that is just a much larger demand on cooling and
power and will incur additional indirect costs as well.

HTH,
    axel.


>
>>Stick with the cpu. For QE you should be best off with intel. Also you are
>> likely to >get the best price/performance ratio with CPUs that have less
>> than the maximum >number of cpu cores and a higher clock instead.
>
>
> Eduardo Menendez Proupin
> Departamento de Fisica, Facultad de Ciencias, Universidad de Chile
> URL: http://www.gnm.cl/emenendez
>
> “Science may be described as the art of systematic oversimplification.” Karl
> Popper
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum



-- 
Dr. Axel Kohlmeyer  akohlmey at gmail.com  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.




More information about the users mailing list