[QE-users] GPU Q-E: how to use

Louis Stuber lstuber at nvidia.com
Wed Sep 9 13:01:11 CEST 2020


Dear Sergey,

I regularly run QE with multiple GPUs, here are answers to your questions:

  *   pw.x makes use of all CPU cores (through OpenMP) and GPU (through MPI), so you have to make sure the CPU binding for both runtimes is properly setup’d (details below).
  *   To check if the system is correctly used, you can use “htop” for monitoring cores activity and “nvidia-smi” for GPU activity. You should see exactly 1 process per GPU in nvidia-smi.

Here are some other guidelines:

  *   Run with as many MPI ranks as you have GPUs. For example on your system: mpirun -n 6 pw.x
  *   Even though by default QE uses all CPUs and GPUs, you need to ensure MPI ranks don’t try to access the same cores and GPUs at the same time or you’ll have a performance drop. I personally use this simple wrapper script: https://raw.githubusercontent.com/LStuber/mpi-binding/master/bind.sh  (I haven’t tested it so much on Power but it should autodetect SMT properly)
  *   Then I run with mpirun -n 6 ./bind.sh pw.x -i pw.in
  *   If your system supports CUDA-aware MPI you should enable it by rebuilding QE with -D__GPU_MPI in make.inc especially if your system has NVlink between GPUs
  *   If your input has multiple k-points, you should run pw.x -npool <nb of GPUs> , to assign 1 k-point per GPU which is much more efficient than default where all GPUs process 1 k-point at the same time. Drawback is that this requires manual tweaking as GPU memory might not be enough and the code will quickly crash. I always play a bit with -npool when running a new test case
  *   To maximize performance you can try to enable MPS on your node (nvidia-cuda-mps-control -d) and set more than 1 MPI rank per GPU. My script has an option for it:
mpirun -n 24 -x MPI_PER_GPU=4 ./bind.sh ../bin/pw.x -npool 6 -i pw.in

Regards,
Louis


From: users <users-bounces at lists.quantum-espresso.org> On Behalf Of Sergey Lisenkov
Sent: Sunday, September 6, 2020 12:16 AM
To: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] GPU Q-E: how to use

External email: Use caution opening links or attachments

Hi Michal,

I'm no expert in GPU graphics, so I didn't even know that this T4 is not good card.

Anyway, this cluster has another GPU graphics card:

.. has 22 training nodes each with 2 Power9 processors, 512GB memory and 6 nVidia V100 GPUs, 128 inference nodes with 2 Power9 processors, 256GB memory and 4 nVidia T4 GPUs, 2 vis nodes each with 2 Power9 processors, 512GB memory and 2 nVidia V100 GPUs (SRD not yet available).

I guess I can use QE on those "training nodes" that have V100 GPUs.


06.09.2020, 00:47, "Michal Krompiec" <michal.krompiec at gmail.com<mailto:michal.krompiec at gmail.com>>:
Dear Sergey,
T4 won’t help you much, even if you manage to compile QE to work with it. You need a GPU with high double-precision performance, such as V100 or P100.
Best regards,
Michal Krompiec
Merck KGaA

On Sat, 5 Sep 2020 at 22:03, Sergey Lisenkov <proffess at yandex.ru<mailto:proffess at yandex.ru>> wrote:
Dear all,

I have an access to our new cluster -  IBM Power9 (each node has 2 x 20 cores + 4 Nvidia T4 GPU).  It seems it is very similar to Marconi100, which seems to be very familiar to many of users here.

Anyway, I'm trying to utilize GPU version of Quantum EspreSSO, but I have almost no experience with that. I was reading WIKI at QE-GPU, so I was able to compile it. We have PGI and GNU compilers installed, IBM Spectrum MPI and OpenMPI, ESSL library.

According to some templates, I was able to compile PW.x  using PGI/IBM Spectrum MPI, FFTW3, and ESSL. CPU version is quite slow on such cluster (almost twice slower than on Cray XC40).

But I'm not sure how to correctly run GPU version. Since each node has 4 GPU, it means pw.x runs only on 4 GPU, and 40 CPU cores are idle? Unfortunately, our cluster does not have too much documentation, so I use google to find out how to utilize such system. We have LSF installed, unfortunately, because all similar systems use SLURM.

If somebody runs GPU Q-E, can you share some examples how to correctly run the code?

Thanks,
 Sergey
_______________________________________________

Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso<http://www.max-centre.eu/quantum-espresso>)

users mailing list users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>

https://lists.quantum-espresso.org/mailman/listinfo/users

,

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso<https://www.max-centre.eu/quantum-espresso>)
users mailing list users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>
https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200909/d757c127/attachment.html>


More information about the users mailing list