[QE-users] QE GPU ORTE_ERROR problem

Sitangshu Bhattacharya sitangshu at iiita.ac.in
Thu Apr 11 20:45:57 CEST 2024


Hi,

I am getting some mpi error while executing the GPU version of QE 7.3.1. I
have used the following commands to install:

module purge

module load nvhpc_23.5/nvhpc/23.5

./configure --with-cuda=$PATH --with-cuda-cc=70 --with-cuda-runtime=12.1
--enable-parallel --enable-openmp --with-cuda-mpi=yes MPIF90=mpif90
FC=nvfortran CC=nvc CXX=nvc++
The nvcc -V shows cuda 12.2. The installation was smooth and all the
binaries were generated. Then I went to the bin and typed ./pw.x.
Unfortunately, this shows:

[login02:158963] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required
executable either could not be found or was not executable by this user in
file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
388

[login02:158963] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required
executable either could not be found or was not executable by this user in
file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
166

--------------------------------------------------------------------------

Sorry!  You were supposed to get help about:

    orte_init:startup:internal-failure

But I couldn't open the help file:

    /proj/nv/libraries/Linux_x86_64/23.5/openmpi/227312-rel-2/share/openmpi/help-orte-runtime:
No such file or directory.  Sorry!

--------------------------------------------------------------------------

--------------------------------------------------------------------------

Sorry!  You were supposed to get help about:

    mpi_init:startup:internal-failure

But I couldn't open the help file:

    /proj/nv/libraries/Linux_x86_64/23.5/openmpi/227312-rel-2/share/openmpi/help-mpi-runtime.txt:
No such file or directory.  Sorry!

--------------------------------------------------------------------------

*** An error occurred in MPI_Init_thread

*** on a NULL communicator

*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,

***    and potentially your MPI job)

[login02:158963] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!


Any solutions?


Regards,
**********************************************
Sitangshu Bhattacharya (সিতাংশু ভট্টাচার্য), Ph.D
Assistant Professor,
Room No. 2221, CC-1,
Electronic Structure Theory Group,
Department of Electronics and Communication Engineering,
Indian Institute of Information Technology-Allahabad
Uttar Pradesh 211 012
India
Telephone: 91-532-2922000 Extn.: 2131
Web-page: http://profile.iiita.ac.in/sitangshu/
Institute: http://www.iiita.ac.in/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20240412/0b46ab73/attachment.html>


More information about the users mailing list