[QE-users] QE-GPU Running Error

Tue Sep 24 20:13:01 CEST 2024

Hello

Usually the "libgomp: TODO" error message occurs because the program has been linked with a cuda version that is not compatible with the gpu  driver that is running in the nodes.
You should inquire with the system managers of your cluster about the right toolchain to use.
Or check the cuda and driver versions and report them in the forum to get more help.
Pietro

Il 24 set 2024 7:06 PM, "Hazra, Shilpa" <shazra3 at uic.edu> ha scritto:
Hello,

I am using the cuda version of Quantum Espresso. The input and the job submission script I am using are written bellow.

INPUT:

&control
    calculation = 'vc-relax'
    prefix = 'silicon'
    outdir = './tmp/'
    pseudo_dir = './'
    etot_conv_thr = 1e-5
    forc_conv_thr = 1e-4
/

&system
    ibrav=2, celldm(1) =14,
    nat=2, ntyp=1,
    ecutwfc=30
/

&electrons
    conv_thr=1e-8
/

&ions
/

&cell
    cell_dofree='ibrav'
/

ATOMIC_SPECIES
  Si  28.0855  Si.pbe-n-kjpaw_psl.1.0.0.UPF

ATOMIC_POSITIONS (alat)
  Si 0.00 0.00 0.00 0 0 0
  Si 0.25 0.25 0.25 0 0 0

K_POINTS (automatic)
  6 6 6 1 1 1
~
~
SCRIPT:

#!/bin/bash
#SBATCH --job-name="test2"
#SBATCH --output="test2.out"
#SBATCH --partition=gpuA40x4
#SBATCH --mem=16G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16   # spread out to use 1 core per numa, set to 64 if tasks is 1
#SBATCH --constraint="scratch"
#SBATCH --gpus-per-node=4
#SBATCH --gpu-bind=closest   # select a cpu close to gpu on pci bus topology
#SBATCH --account=bcox-delta-gpu    # <- match to a "Project" returned by the "accounts" command
#SBATCH -t 08:00:00
#SBATCH -e slurm-%j.err
#SBATCH -o slurm-%j.out

module reset
module load nvhpc/22.11
module load openmpi/4.1.5+cuda
module load quantum-espresso/7.3.1+cuda

export OMP_NUM_THREADS=16  # if code is not multithreaded, otherwise set to 8 or 16
srun pw.x -N 1 -n 1 test2.in > test2.out

Moreover, the job is running well. But I am not getting any output data printed in the output file even after running the job for 30 hours. Along with that, I am getting the following error massage in the .err file.

libgomp: TODO
srun: error: gpub075: task 0: Exited with exit code 1

I am really not getting why this error is happening and the error is coming. I tried to adjust the script for parallel computing. However, all the time I am getting the same error. Please help me out form this problem. If you could help me with solving this problem, it would be really beneficial for me.  Thank you.

Sincerely,
Shilpa Hazra
Ph.D., Chemistry, University of Illinois Chicago

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20240924/9a497885/attachment.html>