[Pw_forum] Problem with QE 4.2.1 and AMD Opteron 6200 / 6300
Fabricio Cannini
fcannini at gmail.com
Tue Dec 17 21:22:19 CET 2013
Em 12-12-2013 08:20, Ivan Girotto escreveu:
> Dear Fabricio,
>
> I reckon there is some inconsistency in the results you are obtaining.
> The AMD 6380 is a 16-core model. I'm wondering how do you map the
> process affinity while running on 8 cores.
> Without controlling such mapping you can obtain substantial performance
> variation at each execution.
> Indeed, cache memory and FPU are shared among a given set of cores and
> the increasing concurrency on shared resources goes along with a
> degradation of the performances.
I tried using 'taskset' and 'hwloc-bind' , and the results were *worse*
than without any of them.
binary 1 :
intel 13.2 + mkl 11.0 + openmpi 1.6.5 / 8 cores
binary 2:
gfortran 4.6 + openblas 0.2.8 + openmpi 1.6.5 / 8 cores
binary 1 time = 1h8m
binary 2 time = 46m57.62s
binary 1 with hwloc = 1h22m
binary 2 with hwloc = 56m40.50s
binary 1 with taskset = 2h27m
binary 2 with taskset = 1h48m
As I understand it, they're creating an overhead to the execution, which
I'm sure is not the intent of them both.
> The AMD 6380 also supports the AVX instruction set extension for vector
> operations at 256bit. Does your O.S. support that too?
> Compile a simple source with -mavx and see whether you can run it. Or
> check if the "avx" flag is present in your /proc/cpuinfo.
I tried it too, but still wasn't enough to make binary 1 faster.
I'm not sure what to make of these results. Any clues ?
TIA,
Fabricio
More information about the users
mailing list