[QE-users] QE 6.4 - slower with intel fftw? how to properly benchmark
Christoph Wolf
wolf.christoph at qns.science
Fri Mar 1 11:13:55 CET 2019
Dear all,
please forgive this "beginner" question but I am facing a weird problem.
When compiling qe-6.4 (intel compiler, intel MPI+OpenMP) with or without
intel's fftw libs I find that in openMP with 2 threads per core the intel
fftw version is roughly "twice as slow" as the internal one
"internal"
General routines
calbec : 2.69s CPU 2.70s WALL ( 382 calls)
fft : 0.47s CPU 0.47s WALL ( 122 calls)
ffts : 0.05s CPU 0.05s WALL ( 12 calls)
fftw : 49.97s CPU 50.12s WALL ( 14648 calls)
Parallel routines
PWSCF : 1m45.03s CPU 1m46.59s WALL
"intel fftw"
General routines
calbec : 6.36s CPU 3.20s WALL ( 382 calls)
fft : 0.93s CPU 0.47s WALL ( 121 calls)
ffts : 0.10s CPU 0.05s WALL ( 12 calls)
fftw : 109.63s CPU 55.23s WALL ( 14648 calls)
Parallel routines
PWSCF : 3m18.32s CPU 1m41.01s WALL
as a benchmark I am running a perovskite with 120 k-points on 30 processors
(one node); There is no (noticeable) difference if I export
OMP_NUM_THREADS=1 (only MPI) so I guess I made some mistake during the
build with regards to the libraries.
Build process is as below
module load intel19/compiler-19
module load intel19/impi-19
export FFT_LIBS="-L$MKLROOT/intel64"
export LAPACK_LIBS="-lmkl_blacs_intelmpi_lp64"
export CC=icc FC=ifort F77=ifort MPIF90=mpiifort MPICC=mpiicc
./configure --enable-parallel --with-scalapack=intel --enable-openmp
This detects BLAS_LIBS, LAPACK_LIBS, SCALAPACK_LIBS and FFT_LIBS.
I am not experienced with benchmarking so if my benchmark is garbage please
suggest a suitable system!
Thanks in advance!
Chris
--
Postdoctoral Researcher
Center for Quantum Nanoscience, Institute for Basic Science
Ewha Womans University, Seoul, South Korea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190301/95ef5984/attachment.html>
More information about the users
mailing list