<div>Hello Pietro,</div><div> </div><div>Thanks for advice. I indeed was experimenting with actual GPU version of QE, taken from the page you mentioned.</div><div> </div><div>So, fft and linear algebra libraries (blas/lapack/elpa, fftw3) - I have those but compiled for CPU. It seems that they can be used for GPU version as well, without modification? For some reason, I thought that for GPU you have to have gpu compiled blas/lapack/fft and this is what NVIDIA HPC SDK provides. But it seems that QE does not use it (or at least, configure script does not).</div><div> </div><div>Sergey</div><div> </div><div> </div><div>09.03.2021, 21:29, "Pietro Bonfa'" <pietro.bonfa@unipr.it>:</div><blockquote><p>Dear Sergey,<br /><br />some (trivial) advice:<br /><br />* version 6.7 detects accelerators but does not use them, the actual<br />release of the accelerated version is here: <a href="https://gitlab.com/QEF/q-e-gpu" rel="noopener noreferrer">https://gitlab.com/QEF/q-e-gpu</a>.<br />The two codes have been merged therefore the next release will include<br />GPU support as well.<br /><br />* You may get a little speedup by linking fftw3, but most of the ffts<br />are done on the GPU with cufft.<br /><br />* OpenMP should definitively be enabled and provides the way to fully<br />exploit the CPUs. Indeed, the number of *MPI processes* should be (as a<br />rule) equal to the number of GPUs (6 x node in your case).<br /><br />* CUDA-aware MPI is an experimental feature. I have used it extensively<br />without problems though.<br /><br />Hope this helps,<br />Pietro<br /><br /><br />On 3/9/21 3:04 PM, Sergey Lisenkov wrote:</p><blockquote> Hello,<br /> I have an access to IBM Power9 cluster with 6 V100 GPUs cores/node, and<br /> 40 CPU cores/node. I have a CPU version of QE-6.7 running, but I would<br /> like to explore GPU version.<br /> We have Nvidia compilers installed (PGI 21.2, cuda 11.1, ESSL 6.2).<br />   When I ran congifure script, in the way described on Wiki page for<br /> QE-GPU, it creates 'make.inc' file with internal FFTW and USE_CUSOLVER.<br /> Also, configure give blas/lapack libraries from PGI.<br /> Is it the way it should be? I see that there are cublas, cufft and other<br /> cuda libraries, but can they be used in QE? ESSL also has<br /> "libesslsmcuda" library, but I don't know if it is relevant. All<br /> examples on QE-GPU Wiki page seems to be outdated, or I may be wrong.<br /> Also, since every computing node has 6 GPUs, I could use CUDA-aware MPI<br /> (enabled with __GPU_MPI flag). Should I provide OMP_NUM_THREADS variable<br /> (=40), to utilize CPU cores? BTW, configure script for some reason does<br /> not activate OpenMP (even if --enable-openmp) is used.<br /> Thanks,<br />   Sergey<br /> <br /> _______________________________________________<br /> Quantum ESPRESSO is supported by MaX (<a href="https://www.max-centre.eu/" rel="noopener noreferrer">www.max-centre.eu</a>)<br /> users mailing list <a href="mailto:users@lists.quantum-espresso.org" rel="noopener noreferrer">users@lists.quantum-espresso.org</a><br /> <a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noopener noreferrer">https://lists.quantum-espresso.org/mailman/listinfo/users</a><br /> </blockquote><p><br />_______________________________________________<br />Quantum ESPRESSO is supported by MaX (<a href="https://www.max-centre.eu/" rel="noopener noreferrer">www.max-centre.eu</a>)<br />users mailing list <a href="mailto:users@lists.quantum-espresso.org" rel="noopener noreferrer">users@lists.quantum-espresso.org</a><br /><a href="https://lists.quantum-espresso.org/mailman/listinfo/users" rel="noopener noreferrer">https://lists.quantum-espresso.org/mailman/listinfo/users</a></p></blockquote>