[QE-users] Large time lag post software upgradation in HPC system

Pietro Davide Delugas pdelugas at sissa.it
Tue Oct 15 10:32:33 CEST 2024


Hello
It is strange because qe v7.3 is way faster than 6.7, especially on GPUs. It has to do with some fine-tuning in using the cluster.
You should ask help to the system managers of your cluster.
Just trying to guess:

  1.
The problem might be hyperthreading, so make sure that OMP_NUM_THREADS is set to 1.
  2.
try to see in the GPU MPI aware communications are working compile with --with-cuda-mpi=no

hope it helps
best regards
Pietro
________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of Niharika Joshi <nh.joshi at ncl.res.in>
Sent: Tuesday, October 15, 2024 09:34
To: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: [QE-users] Large time lag post software upgradation in HPC system

Dear QE users,
I am using a HPC resource for more than a year with QE(6.7Max GPU) without any issue. My present research problem focuses on studying methane and carbon dioxide adsorption on spinel surfaces. The system is large with more than 380 atoms and ~3500 electrons. Normally, 2-3 ionic cycles (with 60-70 iterations) gets complete within a day. However, recently there has been some software upgradation in the computing system after which I have observed a huge time lag in my calculations. Currently, only few iterations are performed in 24 hours.

Please find below two tables listing the details of hardware specifications and upgradation information of software in the computing system.

Component
Specification
CPU
AMD EPYC 7742 64C 2.25GHz
CPU core
128 cores (Dual socket each with 64 cores); 256 cores with hyper-threading
L3 cache
256 Mb
RAM
1 TB
GPU
NVIDIA A100-SXM4
GPU Memory
 40 Gb
Total no. of GPU per node
8
Storage
10.5 PiB PFS based storage
Networking
Mellonex ConnectX-6 VPI (infiniband HDR)


Software
Specification of upgradation
OS
from Ubuntu 20.04.02 (DGX OS 5.0.5) to Ubuntu 22.04.04 (DGX OS 6.3.0)
Kernel
 from 5.4.0-80-generic to 5.15.0-1062-nvidia
CUDA
10.1 to 12.4 (below versions are also available)
NVIDIA Driver version
450.142.00 to 550.90.07

Post software upgradation, QE-7.3 was installed in the following manner:

Step 1 : Source up the HPC-SDK environment:
source /opt/hpc-sdk-23.9/env.sh

Step 2. Set up the environment:
./configure --prefix=installation-location --with-cuda=$CUDA_ROOT --with-cuda-runtime=12.2 --with-cuda-cc=80 --enable-openmp --with-scalapack=no --with-cuda-mpi=yes

Step 3. Compile the source code:
make all -j8

Step 4. Install the compiled binaries:
make install

Kindly, suggest some solution to this problem. Any advice/suggestion at this point would really be very helpful to me.

With best regards,
Niharika Joshi,
National Post Doctoral Fellow,
CSIR National Chemical Laboratory, Pune,
Maharashtra-411008, India.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20241015/f1440f40/attachment.html>


More information about the users mailing list