[Pw_forum] Energy variations in noncolin-constrain_total.in with OpenMP & MKL

Nick Wilson nw.qeforge.5211 at family-wilson.me.uk
Tue Jan 26 12:23:49 CET 2016

I’ve been testing the OpenMP build of Quantum Espresso 5.3.0  on our system using the Intel compiler and MKL and have a question about variation of energy with the number of OpenMP threads used.
I ran all the plane wave tests in the test-suite directory using between 1 and 16 OpenMP threads and they all gave consistent results apart from pw_noncolin/noncolin-constrain_total.in which showed variation in  between -55.54478325 Ry and -55.54478414 Ry.

I ran the test through the Intel Inspector tool but that didn’t show up any threading deadlocks or data races.
I dropped the compiler optimisation to -O0 and added the “-fp-model strict” and “-fp-model source” compiler flags but that had no effect.
I tried using some of the relevant environment variables (KMP_DETERMINISTIC_REDUCTION=1 and MKL_CBWR=COMPATIBLE) which also had no effect.
Changing to use the internal BLAS library resolved the issue so it looks to be MKL-related. It’s present with both the GNU and Intel compilers.
I dropped back to an earlier version of MKL but the effect was still present.
As it was thread-related I tried linking against the sequential version of MKL but that didn’t help.
So, I guess my questions are:
Should the results always be invariant of the number of OpenMP threads?
Is there anything unique about the noncolin-constrain_total.in test case which would cause it to behave differently to the rest of the tests?
Best regards,
Nick Wilson
System details:
  Intel Sandy Bridge E5-2650 CPU
  CentOS Linux release 7.2.1511
  MKL from Intel compiler 16.0.0
 GNU compiler version 4.8.5

More information about the users mailing list