[QE-users] QE 6.4 - slower with intel fftw? how to properly benchmark

Fri Mar 1 12:49:40 CET 2019

Hi Chris
it might it be happening exactly the opposite.
if you don't specify anything the configure tries all the options from 
the best to the worse and the usage for mkl is tested as first guess if  
I am not wrong. If you pass it a specific path just tries that one and 
deals  with it as expecting ordinary fftw library, so it may be failing 
in finding a working fft and turns on the internal one.

Could you send the make.inc files in the 2 cases or the config log ?
Pietro

On 03/01/2019 11:13 AM, Christoph Wolf wrote:
> Dear all,
>
> please forgive this "beginner" question but I am facing a weird 
> problem. When compiling qe-6.4 (intel compiler, intel MPI+OpenMP) with 
> or without intel's fftw libs I find that in openMP with 2 threads per 
> core the intel fftw version is roughly "twice as slow" as the internal one
>
> "internal"
>      General routines
>      calbec       :      2.69s CPU      2.70s WALL (    382 calls)
>      fft          :      0.47s CPU      0.47s WALL (    122 calls)
>      ffts         :      0.05s CPU      0.05s WALL (     12 calls)
>      fftw         :     49.97s CPU     50.12s WALL (  14648 calls)
>      Parallel routines
>      PWSCF        :  1m45.03s CPU     1m46.59s WALL
>
> "intel fftw"
>      General routines
>      calbec       :      6.36s CPU      3.20s WALL (     382 calls)
>      fft          :      0.93s CPU      0.47s WALL (     121 calls)
>      ffts         :      0.10s CPU      0.05s WALL (      12 calls)
>      fftw         :    109.63s CPU     55.23s WALL (   14648 calls)
>      Parallel routines
>      PWSCF        :   3m18.32s CPU   1m41.01s WALL
>
> as a benchmark I am running a perovskite with 120 k-points on 30 
> processors (one node); There is no (noticeable) difference if I export 
> OMP_NUM_THREADS=1 (only MPI) so I guess I made some mistake during the 
> build with regards to the libraries.
>
> Build process is as below
>
> module load intel19/compiler-19
>
> module load intel19/impi-19
>
>
> export FFT_LIBS="-L$MKLROOT/intel64"
>
> export LAPACK_LIBS="-lmkl_blacs_intelmpi_lp64"
>
> export CC=icc FC=ifort F77=ifort MPIF90=mpiifort MPICC=mpiicc
>
>
> ./configure --enable-parallel --with-scalapack=intel --enable-openmp
>
>
> This detects BLAS_LIBS, LAPACK_LIBS, SCALAPACK_LIBS and FFT_LIBS.
>
> I am not experienced with benchmarking so if my benchmark is garbage 
> please suggest a suitable system!
>
> Thanks in advance!
> Chris
>
> -- 
> Postdoctoral Researcher
> Center for Quantum Nanoscience, Institute for Basic Science
> Ewha Womans University, Seoul, South Korea
>
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20190301/4cde6d56/attachment.html>