<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Dear Paolo,</p>
<p>Thank you for your prompt response. Your suggestion was very helpful. <br>
</p>
<p>I have reviewed the numbers and discovered that regardless of the value set in
<i>--gres=gpu:X</i>, the number of ndev remains consistently at 1. Our HPC documentation indicates that
<i>--gres=gpu:X</i> is the correct method to set GPUs, with each node having 4 GPUs. Here is the output when I set
<i>--gres=gpu:4</i>:<br>
</p>
<i> GPU acceleration is ACTIVE.<br>
GPU-aware MPI enabled<br>
</i>
<p><i> nproc (MPI process): 4<br>
ndev (GPU per Node): 1<br>
nnode (Nodes): 1<br>
Message from routine print_cuda_info:<br>
High GPU oversubscription detected. Are you sure this is what you want?</i><br>
</p>
<p>I monitored GPU core usage every 10 seconds, it appears that all 4 GPU cores are activated when setting
<i>--gres=gpu:4</i>:<br>
<br>
<i>utilization.gpu [%], utilization.memory [%]<br>
96 %, 37 %<br>
95 %, 76 %<br>
95 %, 50 %<br>
95 %, 76 %<br>
time = 70 s</i><br>
</p>
<p>For reference, here is my sbatch submission script:<br>
</p>
<p>-----------------------------------------------------------------------<br>
</p>
<p><i>#!/bin/bash -x<br>
#SBATCH --gres=gpu:4 --partition=dc-gpu<br>
#SBATCH --nodes=1<br>
#SBATCH --ntasks-per-node=4<br>
#SBATCH --time=00:00:20<br>
<br>
export OMP_NUM_THREADS=1<br>
<br>
module load NVHPC/23.7-CUDA-12<br>
module load CUDA/12<br>
module load OpenMPI/4.1.5<br>
module load mpi-settings/CUDA<br>
module load imkl/2023.2.0<br>
</i></p>
<p><i>monitor_gpu_usage() {<br>
while true; do<br>
nvidia-smi --query-gpu=utilization.gpu,utilization.memory --format=csv >> gpu_usage_$SLURM_JOB_ID.csv<br>
sleep 10<br>
done<br>
}<br>
monitor_gpu_usage &<br>
srun -n 4 pw.x -nk 4 -nd 1 -nb 1 -nt 1 < inp_pwscf > out_pwscf<br>
</i></p>
<p>-------------------------------------------------------------------------</p>
<p><br>
</p>
<p>Could you please provide guidance on resolving the oversubscription issue? Thank you very much in advance.</p>
<p>Kind regards,</p>
<p>Yin-Ying Ting</p>
<p><br>
</p>
<p></p>
<div class="moz-cite-prefix">On 29.11.23 15:53, Paolo Giannozzi wrote:<br>
</div>
<blockquote type="cite" cite="mid:f4769abe-ae16-477d-8a50-fe1095d76bb1@uniud.it">
On 11/27/23 11:32, Yin-Ying Ting wrote: <br>
<br>
<blockquote type="cite">Based on the *environment.f90* file, this message is triggered when /nproc > ndev * nnode * 2/. If I understand correctly, I have nproc (Number of parallel processe)=4, ndev(Number of GPU Devices per Node) =4 and nnode (Number of Nodes)=1.
This condition seems to be false (4 > 8). Despite this, the message still appears. All 4 GPUs were active during the run.
<br>
</blockquote>
<br>
funny. Even funnier, the number of GPUs actually used does not seem to be written anywhere on output.
<br>
<br>
Add a line printing nproc, ndev, nnode just before the warning is issued, recompile and re-run. One (at least) of those numbers is not what you expect. Computers are not among the most reliable machines, but they should be able to find out who is larger between
4 and 8 <br>
<br>
Paolo <br>
</blockquote>
<div class="moz-signature">-- <br>
<p>Forschungszentrum Jülich GmbH<br>
Institute of Energy and Climate Research<br>
Theory and Computation of Energy Materials (IEK-13)<br>
E-mail: <a href="mailto:y.ting@fz-juelich.de" class="moz-txt-link-freetext">y.ting@fz-juelich.de</a>
</p>
</div>
<br>
<font face="Arial" color="Black" size="1"><br>
------------------------------------------------------------------------------------------------<br>
------------------------------------------------------------------------------------------------<br>
Forschungszentrum Jülich GmbH<br>
52425 Jülich<br>
Sitz der Gesellschaft: Jülich<br>
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498<br>
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller<br>
Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),<br>
Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens<br>
------------------------------------------------------------------------------------------------<br>
------------------------------------------------------------------------------------------------<br>
</font>
</body>
</html>