[QE-users] [QE-GPU] [PHonon] Runtime Crash in ph.x (v7.5 OpenACC) during Electron-Phonon Calculation (elphon.f90)
Dolon Pal
dolon.pal at icloud.com
Tue Nov 25 06:52:20 CET 2025
Dear Quantum ESPRESSO Developers and Community,
I am writing to report a persistent runtime error in the GPU-accelerated version of ph.x (Quantum ESPRESSO v7.5) when calculating electron-phonon coefficients using the OpenACC port.
While the code successfully calculates the Dynamical Matrices and Frequencies on the GPU, it consistently crashes during the final electron-phonon interaction step (routine elphon) with a File I/O error, specifically related to the temporary file a2Fsave.
1. System and Compilation Details:
Version: Quantum ESPRESSO v7.5 (GitLab release)
Compiler: NVIDIA HPC SDK v24.9
Configuration: ./configure --enable-openacc --with-cuda=yes --with-cuda-cc=89 --with-cuda-runtime=12.6
Hardware: NVIDIA RTX 4090 (Ada Lovelace)
MPI: OpenMPI (via NVIDIA HPC SDK)
2. The Issue: When running ph.x with electron_phonon = 'interpolated' (or any mode that triggers elphon), the execution aborts immediately after diagonalizing the dynamical matrix for the first q-point. The crash occurs regardless of the MPI parallelization level (reproduced with both -np 1 and -np 8).
3. Error Log: The crash points to a read error in elphon.f90 attempting to read a file that appears to be empty or not flushed to disk.
FIO-F-217/list-directed read/unit=40/attempt to read past end of file.
File name = './out/mgb2.a2Fsave', formatted, sequential access record = 1
In source file /path/to/q-e/PHonon/PH/elphon.f90, at line number 847
File name = './out/mgb2.a2Fsave', formatted, sequential access record = 1
In source file /path/to/q-e/PHonon/PH/elphon.f90, at line number 847
4. Reproduction Case (MgB2): I reproduced this using a standard MgB2 test case.
Input snippet (ph.in):
Fortran
&INPUTPH
tr2_ph = 1.0d-14,
prefix = 'mgb2',
outdir = './out',
fildyn = 'mgb2.dyn',
fildvscf = 'mgb2.dvscf',
electron_phonon = 'interpolated', ! <--- Triggers the crash
trans = .true.,
ldisp = .true.,
nq1=6, nq2=6, nq3=4
/
5. Observations:
Pure Phonons work: If I comment out electron_phonon, the GPU run finishes successfully and writes .dyn and .dvscffiles.
CPU Works: The exact same input runs successfully on the CPU-only binary (gfortran compilation).
File Incompatibility: I attempted to run the heavy phonon calculation on the GPU and the final electron-phonon collection on the CPU (using recover=.true. or trans=.false.), but the CPU binary cannot read the GPU-generated .dvscf/binary files ("problems reading u" error), likely due to binary format/padding differences between nvfortranand gfortran.
It appears there is a race condition or file handling issue in the OpenACC implementation of the elphon routine where the a2Fsave file is read before it is successfully written/closed.
Any advice on a workaround or a patch for elphon.f90 to stabilize the GPU I/O would be greatly appreciated.
Thank you for your time and for developing this software.
Best regards,
Dholon Kumar Paul
Research Assistant, BRAC University, Bangladesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20251125/b8f234c1/attachment.html>
More information about the users
mailing list