[Pw_forum] Problem with QE 4.2.1 and AMD Opteron 6200 / 6300

Tue Dec 17 21:22:19 CET 2013

Em 12-12-2013 08:20, Ivan Girotto escreveu:
> Dear Fabricio,
>
> I reckon there is some inconsistency in the results you are obtaining.
> The AMD 6380 is a 16-core model. I'm wondering how do you map the
> process affinity while running on 8 cores.
> Without controlling such mapping you can obtain substantial performance
> variation at each execution.
> Indeed, cache memory and FPU are shared among a given set of cores and
> the increasing concurrency on shared resources goes along with a
> degradation of the performances.

I tried using 'taskset' and 'hwloc-bind' , and the results were *worse* 
than without any of them.

binary 1 :
intel 13.2 + mkl 11.0 + openmpi 1.6.5 / 8 cores

binary 2:
gfortran 4.6 + openblas 0.2.8 + openmpi 1.6.5 / 8 cores

binary 1 time = 1h8m
binary 2 time = 46m57.62s

binary 1 with hwloc = 1h22m
binary 2 with hwloc = 56m40.50s

binary 1 with taskset = 2h27m
binary 2 with taskset = 1h48m

As I understand it, they're creating an overhead to the execution, which 
I'm sure is not the intent of them both.

> The AMD 6380 also supports the AVX instruction set extension for vector
> operations at 256bit. Does your O.S. support that too?
> Compile a simple source with -mavx and see whether you can run it. Or
> check if the "avx" flag is present in your /proc/cpuinfo.

I tried it too, but still wasn't enough to make binary 1 faster.

I'm not sure what to make of these results. Any clues ?

TIA,
Fabricio