[QE-users] Problem in Convergence of ph.out in phonon calculations
Natalie Holzwarth
natalie at wfu.edu
Sun Aug 12 21:47:23 CEST 2018
I don't have an answer, but we have seen the same error message which is
associated with an intermittent segmentation fault which we think may be
associated with the interaction of QE and openmpi 3.1.0 and openmpi 3.1.1
compiled with the intel 2018 compiler on our Red Hat RHEL6u9 cluster.
The error happens less frequently when we use openmpi 2.1.0. In our case
the error channel prints the following:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
ph.x 0000000000D99A1D for__signal_handl Unknown Unknown
libpthread-2.12.s 0000003271E0F7E0 Unknown Unknown Unknown
mca_btl_vader.so 00002AB74BBB99A7 Unknown Unknown Unknown
libopen-pal.so.40 00002AB738AD3A54 opal_progress Unknown Unknown
libmpi.so.40.10.1 00002AB7384DBC04 ompi_request_defa Unknown Unknown
libmpi.so.40.10.1 00002AB7385384C5 ompi_coll_base_ba Unknown Unknown
libmpi.so.40.10.1 00002AB7384F26F1 MPI_Barrier Unknown Unknown
libmpi_mpifh.so.4 00002AB73826D013 MPI_Barrier_f08 Unknown Unknown
ph.x 0000000000BA9E0E Unknown Unknown Unknown
ph.x 0000000000B9835B Unknown Unknown Unknown
ph.x 000000000057FE26 Unknown Unknown Unknown
ph.x 00000000004BE229 Unknown Unknown Unknown
ph.x 00000000004A0F10 Unknown Unknown Unknown
ph.x 0000000000415A65 Unknown Unknown Unknown
ph.x 000000000040EE73 Unknown Unknown Unknown
ph.x 000000000040EDDE Unknown Unknown Unknown
libc-2.12.so 000000327161ED1D __libc_start_main Unknown Unknown
ph.x 000000000040ECE9 Unknown Unknown Unknown
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[24484,1],12]
Exit code: 174
--------------------------------------------------------------------------
It seems to be a big mystery. Sincerely, Natalie Holzwarth
N. A. W. Holzwarth email:
natalie at wfu.edu
Department of Physics web:
http://www.wfu.edu/~natalie
Wake Forest University phone:
1-336-758-5510
Winston-Salem, NC 27109 USA office: Rm. 300 Olin
Physical Lab
On Sun, Aug 12, 2018 at 3:06 PM Sina Malakpour <sina.malakpour at gmail.com>
wrote:
> Dear all,
>
> Recently, I am working on linear response method implemented in QE to do
> the phonon calculations for a structure. I apply different isotropic strain
> to the optimized structure and then I run the phonon computations through
> these steps:
>
> 1. scf run
> 2. ph run
> 3. q2r run
>
> For all strains, the scf run is ok, but for some strains, after ph run I
> get this message at the end of the output file:
>
> -------------------------------------------------------
> Primary job terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status,
> thus causing
> the job to be terminated. The first process to do so was:
>
> Process name: [[47696,1],6]
> Exit code: 28
> --------------------------------------------------------------------------
>
> and So, the file of force constants would not be generated. I really
> appreciate it if you guide me through this and let me know what to do to
> fix the problme?
>
> Thanks,
> Sina
>
> Sina Malakpour Estalaki
> PhD student
> Department of Aerospace and Mechanical Engineering
> University of Notre Dame
> Notre Dame, IN, US
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180812/15ecc11d/attachment.html>
More information about the users
mailing list