[QE-users] Sub optimal performance on 32 core AMD machine

Pamela Whitfield whitfieldps1 at gmail.com
Tue Nov 17 21:20:14 CET 2020


I am but a humble chemist - not a computational scientist. I believe Michal
is also an experimentalist (his name seems to ring a bell) looking for
practical pointers to tackle issues with Zen2 I've already dealt with.

My old workstation is a dual E5-2690 with 96GB RAM that was used with
v6.4.1 and the GPU v6.1/v6.4.1 with the Quadro K6000. Thread-pinning did
have some benefits with this system.

I think I did mention that I only use 20 of the 24 cores available for the
3960X, but if not I apologise. I tried disabling hyperthreading in the BIOS
just because it was a simple 5 minute test - it has been known to help with
some programs in the past so worth a shot. I'm not overly interested in the
very high core-count CPUs as my primary analysis software in Windows is
more sensitive to clock-speed. Zen3 Threadripper might change that but for
now 24 cores is enough.

The 3960X blows the 16 cores of the dual Xeons out of the water no matter
which compiler I use (gcc9, gcc10, PGI 19.1, 19.4, HPC-SDK 2020) in an
OpenMPI-only build (v3 or 4) - I have no complaints as I solved them to my
satisfaction.
The behaviour with the GV100 handing off work to the CPU is a little
annoying at times but obscenely fast if the input file is set up correctly.
Hyperthreading is necessary for the GPU version which works fine for DFTD2
but not for DFTD3 - hence I use a CPU-only MPI-only version for the more
complex dispersion corrections.

My problems are probably very similar to Michal, a 2500 cubic Angstrom cell
is not that unusual so my runs are often measured in hours/days and not
minutes. I don't think any sane person is going to buy a Quadro GV100 to
save a few seconds!

I'll now shut up and go away into my experimentalist corner....

Best regards
Pam Whitfield

Hello Pamela,
I don't know whether it is clear or not, so I apologize if I repeat obvious
concepts.
I just bought a Threadripper 3990X with 64 core 128 threads. As far as I
remember the 3960X should have 24 core - 48 threads.
It is very very important to don't use more than 24 cores on 3960X . Simply
forget about hyperthreading. No need to disable it in the BIOS, but simply
count the real number of cores.

I use gcc 9.3.0 and the new gcc 10 should be even better for AMD cpus.
With openblas 0.3.12 I found that my 8-cores home Ryzen 3800X is fast as a
Xeon 12 cores E5-2680 using quantum espresso 6.4.1
Carlo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20201117/20ebb0fe/attachment.html>


More information about the users mailing list