[Pw_forum] Problem with MPI parallelization: Error in routine zsqmred

Paolo Giannozzi p.giannozzi at gmail.com
Fri Sep 2 10:17:05 CEST 2016


First of all, try to figure out if the problem is reproducible on another
machine, or with another software configuration (compilers, libraries etc).
Nobody has ever reported such an error.

Paolo

On Fri, Sep 2, 2016 at 9:43 AM, Jan Oliver Oelerich <
jan.oliver.oelerich at physik.uni-marburg.de> wrote:

> Hi QE users,
>
> I am trying to run QE 5.4.0 with MPI parallelization on a mid-size
> cluster. I successfully tested the installation using 8 processes
> distributed on 2 nodes, so communication across nodes is not a problem.
> When I, however, run the same calculation on 64 cores, I am getting the
> following error messages in the stdout:
>
>
>        iteration #  1     ecut=    30.00 Ry     beta=0.70
>        Davidson diagonalization with overlap
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%%%%%%%%%%%%%%
>        Error in routine  zsqmred (8):
>
>         somthing wrong with row 3
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%%%%%%%%%%%%%%
>
>        stopping ...
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%%%%%%%%%%%%%%
>
>        Error in routine  zsqmred (4):
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%%%%%%%%%%%%%%
>         somthing wrong with row 3
>        Error in routine  zsqmred (12):
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%%%%%%%%%%%%%%
>         somthing wrong with row 3
>
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%%%%%%%%%%%%%%
>        stopping ...
>
>        stopping ...
>
>
> The cluster queues stderr shows that some MPI processes exited:
>
>
> PSIlogger: Child with rank 28 exited with status 12.
> PSIlogger: Child with rank 8 exited with status 4.
> application called MPI_Abort(MPI_COMM_WORLD, 12) - process 28application
> called MPI_Abort(MPI_COMM_WORLD, 4) - process 8application called
> MPI_Abort(MPI_COMM_WORLD, 8) - process 18kvsprovider[12375]: sighandler:
> Terminating the job.
> PSIlogger: Child with rank 18 exited with status 8.
> PSIlogger: Child with rank 4 exited with status 1.
> PSIlogger: Child with rank 15 exited with status 1.
> PSIlogger: Child with rank 53 exited with status 1.
> PSIlogger: Child with rank 30 exited with status 1.
>
>
> The cluster is running some sort of Sun Grid Engine and I used intel
> MPI. I see no other error messages. Could you give me a hint how to
> debug this further? Verbosity is already 'high'.
>
> Thank you very much and best regards,
> Jan Oliver Oelerich
>
>
>
>
> --
> Dr. Jan Oliver Oelerich
> Faculty of Physics and Material Sciences Center
> Philipps-Universität Marburg
>
> Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany
> Phone: +49 6421 2822260
> Mail : jan.oliver.oelerich at physik.uni-marburg.de
> Web  : http://academics.oelerich.org
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum




-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160902/7d8b8472/attachment.html>


More information about the users mailing list