<div dir="ltr"><div>1. You are looking at the wrong numbers, please check out the WALL instead of CPU.</div><div>2. With 2 threads per core you are using hardware threads (HT) which share the resource of physical cores. On a few architectures, HT do boost performance of QE, on most architectures, HT can play nothing or even negative on the performance of QE because HTs fight for the shared resource. The basic strategy is just trying it. You keep using it if you gain something. If not, just don't use HT.</div><div>3. OpenMP is implemented targeting physical cores. I'm not saying you cannot use OpenMP threads on HT but any performance claim is mostly hardware related not software.<br></div><div>Ye<br></div><div><div><div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">===================<br>

Ye Luo, Ph.D.<br>Computational Science Division & Leadership Computing Facility<br>

Argonne National Laboratory</div></div></div></div></div><br></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Christoph Wolf <wolf.christoph@qns.science> 于2019年3月1日周五 上午4:14写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">Dear all,<div><br></div><div>please forgive this "beginner" question but I am facing a weird problem. When compiling qe-6.4 (intel compiler, intel MPI+OpenMP) with or without intel's fftw libs I find that in openMP with 2 threads per core the intel fftw version is roughly "twice as slow" as the internal one</div><div><br></div><div>"internal"</div><div><div>     General routines</div><div>     calbec       :      2.69s CPU      2.70s WALL (     382 calls)</div><div>     fft          :      0.47s CPU      0.47s WALL (     122 calls)</div><div>     ffts         :      0.05s CPU      0.05s WALL (      12 calls)</div><div>     fftw         :     49.97s CPU     50.12s WALL (   14648 calls)</div><div> </div><div>     Parallel routines</div><div> </div><div>     PWSCF        :  1m45.03s CPU     1m46.59s WALL</div><div><br></div><div>"intel fftw"</div><div><div>     General routines</div><div>     calbec       :      6.36s CPU      3.20s WALL (     382 calls)</div><div>     fft          :      0.93s CPU      0.47s WALL (     121 calls)</div><div>     ffts         :      0.10s CPU      0.05s WALL (      12 calls)</div><div>     fftw         :    109.63s CPU     55.23s WALL (   14648 calls)</div><div> </div><div>     Parallel routines</div><div> </div><div>     PWSCF        :   3m18.32s CPU   1m41.01s WALL</div></div><div><br></div><div>as a benchmark I am running a perovskite with 120 k-points on 30 processors (one node); There is no (noticeable) difference if I export OMP_NUM_THREADS=1 (only MPI) so I guess I made some mistake during the build with regards to the libraries.</div><div><br></div><div>Build process is as below</div><div><br></div><div><p class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-m_436820312386234918MsoListParagraph" style="margin:0cm 0cm 0.0001pt 56pt;font-size:12pt;font-family:굴림;text-align:justify"><span style="font-size:10pt;font-family:"\00b9d1\00c740  \00ace0\00b515"" lang="EN-US">module load intel19/compiler-19<u></u><u></u></span></p><p class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-m_436820312386234918MsoListParagraph" style="margin:0cm 0cm 0.0001pt 56pt;font-size:12pt;font-family:굴림;text-align:justify"><span style="font-size:10pt;font-family:"\00b9d1\00c740  \00ace0\00b515"" lang="EN-US">module load intel19/impi-19</span><br></p><p class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-m_436820312386234918MsoListParagraph" style="margin:0cm 0cm 0.0001pt 56pt;font-size:12pt;font-family:굴림;text-align:justify"><span style="font-size:10pt;font-family:"\00b9d1\00c740  \00ace0\00b515"" lang="EN-US"><br></span></p><p class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-m_436820312386234918MsoListParagraph" style="margin:0cm 0cm 0.0001pt 56pt;text-align:justify"><span lang="EN-US"><font face="맑은 고딕"><span style="font-size:13.3333px">export FFT_LIBS="-L$MKLROOT/intel64"</span></font><br></span></p><p class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-m_436820312386234918MsoListParagraph" style="margin:0cm 0cm 0.0001pt 56pt;text-align:justify"><span lang="EN-US"><font face="맑은 고딕"><span style="font-size:13.3333px">export LAPACK_LIBS="-lmkl_blacs_intelmpi_lp64"</span></font><br></span></p><div><p class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-m_436820312386234918MsoListParagraph" style="margin:0cm 0cm 0.0001pt 56pt;font-size:12pt;font-family:굴림;text-align:justify"><span style="font-size:10pt;font-family:"\00b9d1\00c740  \00ace0\00b515"" lang="EN-US"><span class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-il">export</span> CC=icc <span class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-il">FC</span>=ifort F77=ifort MPIF90=mpiifort MPICC=mpiicc<u></u><u></u></span></p><p class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-m_436820312386234918MsoListParagraph" style="margin:0cm 0cm 0.0001pt 56pt;font-size:12pt;font-family:굴림;text-align:justify"><span style="font-size:10pt;font-family:"\00b9d1\00c740  \00ace0\00b515"" lang="EN-US"><br></span></p><p class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-m_436820312386234918MsoListParagraph" style="margin:0cm 0cm 0.0001pt 56pt;text-align:justify"><font face="맑은 고딕"><span style="font-size:13.3333px">./configure --enable-parallel --with-scalapack=intel --enable-openmp</span></font><br></p><p class="gmail-m_-6607725773569025611gmail-m_-8828478814826450334gmail-m_436820312386234918MsoListParagraph" style="margin:0cm 0cm 0.0001pt 56pt;text-align:justify"><font face="맑은 고딕"><span style="font-size:13.3333px"><br></span></font></p>This detects BLAS_LIBS, LAPACK_LIBS, SCALAPACK_LIBS and FFT_LIBS.</div></div><div><br></div><div>I am not experienced with benchmarking so if my benchmark is garbage please suggest a suitable system!</div><div><br></div><div>Thanks in advance!</div><div>Chris </div><div><br></div>-- <br><div dir="ltr" class="gmail-m_-6607725773569025611gmail_signature"><div dir="ltr">Postdoctoral Researcher<br>Center for Quantum Nanoscience, Institute for Basic Science<br>Ewha Womans University, Seoul, South Korea<blockquote type="cite" style="font-size:12.8px"><div dir="ltr"><div><div dir="ltr"></div></div></div></blockquote></div></div></div></div></div></div>

_______________________________________________<br>

users mailing list<br>

<a href="mailto:users@lists.quantum-espresso.org" target="_blank">users@lists.quantum-espresso.org</a><br>

<a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a></blockquote></div>