[QE-users] QE GPU ORTE_ERROR problem

Sitangshu Bhattacharya sitangshu at iiita.ac.in
Fri Apr 12 07:09:53 CEST 2024


Typo: Sorry, the cuda version after doing nvcc -V shows 12.1 and I have
V100 cards.

On Fri, Apr 12, 2024 at 12:15 AM Sitangshu Bhattacharya <
sitangshu at iiita.ac.in> wrote:

> Hi,
>
> I am getting some mpi error while executing the GPU version of QE 7.3.1. I
> have used the following commands to install:
>
> module purge
>
> module load nvhpc_23.5/nvhpc/23.5
>
> ./configure --with-cuda=$PATH --with-cuda-cc=70 --with-cuda-runtime=12.1
> --enable-parallel --enable-openmp --with-cuda-mpi=yes MPIF90=mpif90
> FC=nvfortran CC=nvc CXX=nvc++
> The nvcc -V shows cuda 12.2. The installation was smooth and all the
> binaries were generated. Then I went to the bin and typed ./pw.x.
> Unfortunately, this shows:
>
> [login02:158963] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required
> executable either could not be found or was not executable by this user in
> file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
> 388
>
> [login02:158963] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required
> executable either could not be found or was not executable by this user in
> file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line
> 166
>
> --------------------------------------------------------------------------
>
> Sorry!  You were supposed to get help about:
>
>     orte_init:startup:internal-failure
>
> But I couldn't open the help file:
>
>     /proj/nv/libraries/Linux_x86_64/23.5/openmpi/227312-rel-2/share/openmpi/help-orte-runtime:
> No such file or directory.  Sorry!
>
> --------------------------------------------------------------------------
>
> --------------------------------------------------------------------------
>
> Sorry!  You were supposed to get help about:
>
>     mpi_init:startup:internal-failure
>
> But I couldn't open the help file:
>
>     /proj/nv/libraries/Linux_x86_64/23.5/openmpi/227312-rel-2/share/openmpi/help-mpi-runtime.txt:
> No such file or directory.  Sorry!
>
> --------------------------------------------------------------------------
>
> *** An error occurred in MPI_Init_thread
>
> *** on a NULL communicator
>
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>
> ***    and potentially your MPI job)
>
> [login02:158963] Local abort before MPI_INIT completed completed
> successfully, but am not able to aggregate error messages, and not able to
> guarantee that all other processes were killed!
>
>
> Any solutions?
>
>
> Regards,
> **********************************************
> Sitangshu Bhattacharya (সিতাংশু ভট্টাচার্য), Ph.D
> Assistant Professor,
> Room No. 2221, CC-1,
> Electronic Structure Theory Group,
> Department of Electronics and Communication Engineering,
> Indian Institute of Information Technology-Allahabad
> Uttar Pradesh 211 012
> India
> Telephone: 91-532-2922000 Extn.: 2131
> Web-page: http://profile.iiita.ac.in/sitangshu/
> Institute: http://www.iiita.ac.in/
>
>

-- 
**********************************************
Sitangshu Bhattacharya (সিতাংশু ভট্টাচার্য), Ph.D
Assistant Professor,
Room No. 2221, CC-1,
Electronic Structure Theory Group,
Department of Electronics and Communication Engineering,
Indian Institute of Information Technology-Allahabad
Uttar Pradesh 211 012
India
Telephone: 91-532-2922000 Extn.: 2131
Web-page: http://profile.iiita.ac.in/sitangshu/
Institute: http://www.iiita.ac.in/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20240412/f59814ae/attachment.html>


More information about the users mailing list