[QE-users] need help, QE parallel is working voidly

Mainak Ghosh mainakghosh555555 at gmail.com
Fri Dec 25 21:01:58 CET 2020


Dear all,
          Today i tried to run a quantum espresso code in parallel
execution via the command line as,
      " mpirun -np 32 '/home/mainak/Desktop/qe-6.5/bin/pw.x' -npool 4 -
bgrp 4 -ndiag 36 <pha_nscf.in| tee pha_nscf.out "
     the run was completed and the results are correct but in terminal it
shows,


"      PWSCF        :  41m 7.66s CPU      7h22m WALL


   This run was terminated on:  20:59:32  25Dec2020

=------------------------------------------------------------------------------=
   JOB DONE.
=------------------------------------------------------------------------------=
[mainak-linpc:26225] [[38684,1],0]-[[38684,0],0] mca_oob_tcp_msg_recv:
readv failed: Connection timed out (110)
[mainak-linpc:26227] [[38684,1],2]-[[38684,0],0] mca_oob_tcp_msg_recv:
readv failed: Connection timed out (110)
[mainak-linpc:26229] [[38684,1],4]-[[38684,0],0] mca_oob_tcp_msg_recv:
readv failed: Connection timed out (110)
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 26227 on
node mainak-linpc exiting improperly. There are two reasons this could
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
"


    As you can see  in the last line,

"  PWSCF        :  41m 7.66s CPU      7h22m WALL "
    I, personally unable to find out what went wrong ! Advices will be very
much helpful to me. For concern, i would like to tell  that my cpu  has 4
physical cores. Here in the following i am also attaching the '.out' file.

      Thanks in advance.
                                                                  Mainak
Ghosh
                                                               University
of Calcutta, India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20201226/3de7c5e8/attachment.html>


More information about the users mailing list