[QE-users] QE-GPU Running Error
Pietro Davide Delugas
pdelugas at sissa.it
Tue Sep 24 20:13:01 CEST 2024
Hello
Usually the "libgomp: TODO" error message occurs because the program has been linked with a cuda version that is not compatible with the gpu driver that is running in the nodes.
You should inquire with the system managers of your cluster about the right toolchain to use.
Or check the cuda and driver versions and report them in the forum to get more help.
Pietro
Il 24 set 2024 7:06 PM, "Hazra, Shilpa" <shazra3 at uic.edu> ha scritto:
Hello,
I am using the cuda version of Quantum Espresso. The input and the job submission script I am using are written bellow.
INPUT:
&control
calculation = 'vc-relax'
prefix = 'silicon'
outdir = './tmp/'
pseudo_dir = './'
etot_conv_thr = 1e-5
forc_conv_thr = 1e-4
/
&system
ibrav=2, celldm(1) =14,
nat=2, ntyp=1,
ecutwfc=30
/
&electrons
conv_thr=1e-8
/
&ions
/
&cell
cell_dofree='ibrav'
/
ATOMIC_SPECIES
Si 28.0855 Si.pbe-n-kjpaw_psl.1.0.0.UPF
ATOMIC_POSITIONS (alat)
Si 0.00 0.00 0.00 0 0 0
Si 0.25 0.25 0.25 0 0 0
K_POINTS (automatic)
6 6 6 1 1 1
~
~
SCRIPT:
#!/bin/bash
#SBATCH --job-name="test2"
#SBATCH --output="test2.out"
#SBATCH --partition=gpuA40x4
#SBATCH --mem=16G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16 # spread out to use 1 core per numa, set to 64 if tasks is 1
#SBATCH --constraint="scratch"
#SBATCH --gpus-per-node=4
#SBATCH --gpu-bind=closest # select a cpu close to gpu on pci bus topology
#SBATCH --account=bcox-delta-gpu # <- match to a "Project" returned by the "accounts" command
#SBATCH -t 08:00:00
#SBATCH -e slurm-%j.err
#SBATCH -o slurm-%j.out
module reset
module load nvhpc/22.11
module load openmpi/4.1.5+cuda
module load quantum-espresso/7.3.1+cuda
export OMP_NUM_THREADS=16 # if code is not multithreaded, otherwise set to 8 or 16
srun pw.x -N 1 -n 1 test2.in > test2.out
Moreover, the job is running well. But I am not getting any output data printed in the output file even after running the job for 30 hours. Along with that, I am getting the following error massage in the .err file.
libgomp: TODO
srun: error: gpub075: task 0: Exited with exit code 1
I am really not getting why this error is happening and the error is coming. I tried to adjust the script for parallel computing. However, all the time I am getting the same error. Please help me out form this problem. If you could help me with solving this problem, it would be really beneficial for me. Thank you.
Sincerely,
Shilpa Hazra
Ph.D., Chemistry, University of Illinois Chicago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20240924/9a497885/attachment.html>
More information about the users
mailing list