[QE-users] [QE-GPU] [PHonon] Runtime Crash in ph.x (v7.5 OpenACC) during Electron-Phonon Calculation (elphon.f90)

Dolon Pal dolon.pal at icloud.com
Tue Nov 25 06:52:20 CET 2025


Dear Quantum ESPRESSO Developers and Community,

I am writing to report a persistent runtime error in the GPU-accelerated version of ph.x (Quantum ESPRESSO v7.5) when calculating electron-phonon coefficients using the OpenACC port.

While the code successfully calculates the Dynamical Matrices and Frequencies on the GPU, it consistently crashes during the final electron-phonon interaction step (routine elphon) with a File I/O error, specifically related to the temporary file a2Fsave.

1. System and Compilation Details:

Version: Quantum ESPRESSO v7.5 (GitLab release)

Compiler: NVIDIA HPC SDK v24.9

Configuration: ./configure --enable-openacc --with-cuda=yes --with-cuda-cc=89 --with-cuda-runtime=12.6

Hardware: NVIDIA RTX 4090 (Ada Lovelace)

MPI: OpenMPI (via NVIDIA HPC SDK)

2. The Issue: When running ph.x with electron_phonon = 'interpolated' (or any mode that triggers elphon), the execution aborts immediately after diagonalizing the dynamical matrix for the first q-point. The crash occurs regardless of the MPI parallelization level (reproduced with both -np 1 and -np 8).

3. Error Log: The crash points to a read error in elphon.f90 attempting to read a file that appears to be empty or not flushed to disk.

 FIO-F-217/list-directed read/unit=40/attempt to read past end of file.
 File name = './out/mgb2.a2Fsave',   formatted, sequential access   record = 1
 In source file /path/to/q-e/PHonon/PH/elphon.f90, at line number 847
 File name = './out/mgb2.a2Fsave',   formatted, sequential access   record = 1
 In source file /path/to/q-e/PHonon/PH/elphon.f90, at line number 847
4. Reproduction Case (MgB2): I reproduced this using a standard MgB2 test case.

Input snippet (ph.in):

Fortran
&INPUTPH
  tr2_ph   = 1.0d-14,
  prefix   = 'mgb2',
  outdir   = './out',
  fildyn   = 'mgb2.dyn',
  fildvscf = 'mgb2.dvscf',
  electron_phonon = 'interpolated',  ! <--- Triggers the crash
  trans    = .true.,
  ldisp    = .true.,
  nq1=6, nq2=6, nq3=4
/
5. Observations:

Pure Phonons work: If I comment out electron_phonon, the GPU run finishes successfully and writes .dyn and .dvscffiles.

CPU Works: The exact same input runs successfully on the CPU-only binary (gfortran compilation).

File Incompatibility: I attempted to run the heavy phonon calculation on the GPU and the final electron-phonon collection on the CPU (using recover=.true. or trans=.false.), but the CPU binary cannot read the GPU-generated .dvscf/binary files ("problems reading u" error), likely due to binary format/padding differences between nvfortranand gfortran.

It appears there is a race condition or file handling issue in the OpenACC implementation of the elphon routine where the a2Fsave file is read before it is successfully written/closed.

Any advice on a workaround or a patch for elphon.f90 to stabilize the GPU I/O would be greatly appreciated.

Thank you for your time and for developing this software.

Best regards,

Dholon Kumar Paul

Research Assistant, BRAC University, Bangladesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20251125/b8f234c1/attachment.html>


More information about the users mailing list