[QE-users] Poor GPU scaling for Gamma-point-only calculation on multiple GPUs

Tue May 13 08:47:24 CEST 2025

Hi all,
I’m running a Gamma-point-only SCF calculation for a nanoparticle system using Quantum ESPRESSO with GPU support. The HPC node has 4 NVIDIA GH200 GPUs, and I observe very poor scaling behavior when increasing the number of GPUs.
Setup
I use 1 MPI task per GPU, with 72 OpenMP threads per task:

  *
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
  *
QE version: 7.4 (GPU build with OpenMP)
  *
MPI: OpenMPI (GPU-aware)

Here is the relevant SLURM configuration:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=72
#SBATCH --gpus=4
#SBATCH --gpu-bind=map_gpu:0,1,2,3
#SBATCH --time=00:30:00

module use Spack
module load nvhpc/24.11 openmpi/main-7zgw-GH200-gpu quantum-espresso/7.4-gpu-omp

srun pw.x -pd .true. -npool 1 -in aiida.in > aiida.out

Performance Results (time per SCF iteration)
Configuration
Time (sec)
1 task, 1 GPU
11.9
2 tasks, 2 GPUs
17.0
3 tasks, 3 GPUs
15.3
4 tasks, 4 GPUs
9.7

As you can see, the best performance is with just 1 task + 1 GPU. Increasing the number of GPUs initially worsens the performance, and only with 4 GPUs does it slightly improve.

>From the output:

GPU acceleration is ACTIVE. 1 visible GPUs per MPI rank
GPU-aware MPI enabled
Message from routine print_cuda_info:
  High GPU oversubscription detected. Are you sure this is what you want?

I tried --gpu-bind=map_gpu:0,1,2,3 to explicitly bind GPUs to ranks, but the warning still appears, and performance doesn’t change. I’ve also experimented with -nb and -ndiag parameters, but they either don’t help or make things worse.
Question:
Does anyone have experience optimizing Gamma-point-only calculations on multiple GPUs? Is there a known bottleneck or best practice for using multiple GPUs efficiently in such a case?
Any insights would be greatly appreciated.
Best,
Xing

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20250513/d759aaf7/attachment.html>