[Pw_forum] a strange behavior for QE-v4 compared with QE-v3

JR Schmidt schmidt at chem.wisc.edu
Wed Nov 26 15:52:52 CET 2008

I ran into this same issue.  Let me say that (for whatever reason) 
setting OMP_NUM_THREADS, or MKL_NUM_THREADS did not seem to fix the 
problem.  MKL was still creating many threads per process, though 
perhaps only one thread was active at at time.

I found a better solution, offering increased performance at least for 
parallel jobs, was to link with the MKL serial library.  This can be 
done by modifying the following lines in make.sys (for MKL 10)

BLAS_LIBS =  -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
LAPACK_LIBS =  -lmkl_intel_lp64 -lmkl_sequential -lmkl_core

then doing:
make clean (probably unnecessary)

Although this gives WORSE performance when you are only running a single 
job (since it does not thread to take advantage of the other cores), if 
you are trying to fully utilize your nodes by running several jobs or 
parallel jobs, using the threaded library results in a giant mess.

> this looks a lot like you are using two different versions
> of MKL for the two compiles and thus are another "victim"
> of the automatic threading of MKL v10. you should not
> have a process with >100% CPU with a regular compile.
> now if you have 8 cores and 8 MPI tasks and each of them
> threads across 8 cores, you have a) a severe overload of
> the scheduler and b) a big mess and all kinds of bad
> performance issues.
> try setting OMP_NUM_THREADS=1 and check if that
> changes the behavior.
J.R. Schmidt
Assistant Professor of Chemistry
Room 8305D
Department of Chemistry
University of Wisconsin-Madison
1101 University Ave
Madison, WI 53706

Phone: (608) 262-2996
Fax: (608) 262-9918
E-mail: schmidt at chem.wisc.edu

More information about the users mailing list