[Pw_forum] Problem with MPI parallelization: Error in routine zsqmred

Jan Oliver Oelerich jan.oliver.oelerich at physik.uni-marburg.de
Fri Sep 2 09:43:59 CEST 2016


Hi QE users,

I am trying to run QE 5.4.0 with MPI parallelization on a mid-size 
cluster. I successfully tested the installation using 8 processes 
distributed on 2 nodes, so communication across nodes is not a problem. 
When I, however, run the same calculation on 64 cores, I am getting the 
following error messages in the stdout:


       iteration #  1     ecut=    30.00 Ry     beta=0.70
       Davidson diagonalization with overlap

 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
       Error in routine  zsqmred (8):

        somthing wrong with row 3
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

       stopping ...
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

       Error in routine  zsqmred (4):
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        somthing wrong with row 3
       Error in routine  zsqmred (12):
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        somthing wrong with row 3

 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
       stopping ...

       stopping ...


The cluster queues stderr shows that some MPI processes exited:


PSIlogger: Child with rank 28 exited with status 12.
PSIlogger: Child with rank 8 exited with status 4.
application called MPI_Abort(MPI_COMM_WORLD, 12) - process 28application 
called MPI_Abort(MPI_COMM_WORLD, 4) - process 8application called 
MPI_Abort(MPI_COMM_WORLD, 8) - process 18kvsprovider[12375]: sighandler: 
Terminating the job.
PSIlogger: Child with rank 18 exited with status 8.
PSIlogger: Child with rank 4 exited with status 1.
PSIlogger: Child with rank 15 exited with status 1.
PSIlogger: Child with rank 53 exited with status 1.
PSIlogger: Child with rank 30 exited with status 1.


The cluster is running some sort of Sun Grid Engine and I used intel 
MPI. I see no other error messages. Could you give me a hint how to 
debug this further? Verbosity is already 'high'.

Thank you very much and best regards,
Jan Oliver Oelerich




-- 
Dr. Jan Oliver Oelerich
Faculty of Physics and Material Sciences Center
Philipps-Universität Marburg

Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany
Phone: +49 6421 2822260
Mail : jan.oliver.oelerich at physik.uni-marburg.de
Web  : http://academics.oelerich.org



More information about the users mailing list