[QE-users] GIPAW Segmentation Fault

Holzwarth, Natalie natalie at wfu.edu
Tue Aug 7 18:51:23 CEST 2018


I am not sure what is the output for that particular line, but the last
part of the output looks as given below.    For different instances of the
run, the particular iteration changes, but the boxed error message returned
by phonon.f90 is always the same.    We have seen the error in both 6.2.1
and 6.3 versions of the QE package.  If this is of interest, I will be glad
to send any additional information.    The run is for NaCl.    Thanks,
Natalie

------------- end of phonon output file:
     Representation #  3 mode #   3

     Self-consistent Calculation

      iter #   1 total cpu time :  1573.2 secs   av.it.:   6.9
      thresh= 1.000E-02 alpha_mix =  0.700 |ddv_scf|^2 =  1.571E-05

      iter #   2 total cpu time :  1602.0 secs   av.it.:  11.4
      thresh= 3.964E-04 alpha_mix =  0.700 |ddv_scf|^2 =  1.095E-05

      iter #   3 total cpu time :  1629.0 secs   av.it.:  10.6
      thresh= 3.308E-04 alpha_mix =  0.700 |ddv_scf|^2 =  3.191E-08

      iter #   4 total cpu time :  1657.8 secs   av.it.:  11.2
      thresh= 1.786E-05 alpha_mix =  0.700 |ddv_scf|^2 =  2.458E-10

      iter #   5 total cpu time :  1687.6 secs   av.it.:  11.9
      thresh= 1.568E-06 alpha_mix =  0.700 |ddv_scf|^2 =  2.328E-11
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------


N. A. W. Holzwarth                                       email:
natalie at wfu.edu
Department of Physics                                  web:
http://www.wfu.edu/~natalie
Wake Forest University                                 phone:
1-336-758-5510
Winston-Salem, NC 27109 USA                     office: Rm. 300 Olin
Physical Lab

On Tue, Aug 7, 2018 at 12:00 PM, Pietro Delugas <pdelugas at sissa.it> wrote:

> hello
>
> what is the output of
>
> addr2line -p -e ph.x 00000000004BE229
> and what version of ph are you using ?
>
>
> On 07/08/2018 17:06, Holzwarth, Natalie wrote:
>
> This segmentation fault issue has also appeared for us in another QE
> code.    Perhaps it is a totally unrelated problem which we find related to
> the openmpi package compiled with intel-3.1.1-2018 and intel-3.1.0-2018.
>  In our case, compiling with openmpi package compiled with intel-2.1.0-2018
> usually (not always) solves the problem.  Compiling with a much older
> openmpi package solves the problem, but is not viable for the current
> configuration of our cluster. Since no one else mentioned the problem until
> now, we think it may have to do with our internal setup???  The error is
> very intermittent,   occurring at different places in the code for the same
> input.    The common aspect of this error to the one in the original
> message is libpthread-2.12.so. Most reliably we see the error in ph.x
> with part of the error message:
>
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image              PC                Routine            Line
> Source
> ph.x               0000000000D99A1D  for__signal_handl     Unknown  Unknown
> libpthread-2.12.s  0000003271E0F7E0  Unknown               Unknown  Unknown
> mca_btl_vader.so   00002AB74BBB99A7  Unknown               Unknown  Unknown
> libopen-pal.so.40  00002AB738AD3A54  opal_progress         Unknown  Unknown
> libmpi.so.40.10.1  00002AB7384DBC04  ompi_request_defa     Unknown  Unknown
> libmpi.so.40.10.1  00002AB7385384C5  ompi_coll_base_ba     Unknown  Unknown
> libmpi.so.40.10.1  00002AB7384F26F1  MPI_Barrier           Unknown  Unknown
> libmpi_mpifh.so.4  00002AB73826D013  MPI_Barrier_f08       Unknown  Unknown
> ph.x               0000000000BA9E0E  Unknown               Unknown  Unknown
> ph.x               0000000000B9835B  Unknown               Unknown  Unknown
> ph.x               000000000057FE26  Unknown               Unknown  Unknown
> ph.x               00000000004BE229  Unknown               Unknown  Unknown
> ph.x               00000000004A0F10  Unknown               Unknown  Unknown
> ph.x               0000000000415A65  Unknown               Unknown  Unknown
> ph.x               000000000040EE73  Unknown               Unknown  Unknown
> ph.x               000000000040EDDE  Unknown               Unknown  Unknown
> libc-2.12.so       000000327161ED1D  __libc_start_main     Unknown
> Unknown
> ph.x               000000000040ECE9  Unknown               Unknown  Unknown
>
> I am very curious about whether you think this may be related or totally
> unrelated.   Thanks,  Natalie Holzwarth
>
> N. A. W. Holzwarth                                       email:
> natalie at wfu.edu
> Department of Physics                                  web:
> http://www.wfu.edu/~natalie
> Wake Forest University                                 phone:
> 1-336-758-5510
> Winston-Salem, NC 27109 USA                     office: Rm. 300 Olin
> Physical Lab
>
> On Tue, Aug 7, 2018 at 8:18 AM, Davide Ceresoli <davide.ceresoli at cnr.it>
> wrote:
>
>> Dear Ben,
>>     I'm afraid it's a problem with MKL-blas ZDOTC, which must
>> return a complex(dp) result. Very strange, because if you grep
>> the source code, we have declared it: complex(dp), external::zdtoc
>>
>> Can you tell us your compiler and MKL version? can you add
>> DFLAGS+=-Dzdotc=zdotc_wrapper
>> to the QE make.inc and recompile both (QE and GIPAW)?
>>
>> Best wishes,
>>     Davide
>>
>>
>> On 08/06/2018 03:19 PM, Ben Comer wrote:
>>
>>> Hello,
>>>
>>> I've been trying to do g factor calculations in GIPAW working for a few
>>> days now. I keep getting a segmentation fault (below) no matter how I
>>> compile it on our cluster. Does anyone have a good idea of what might be
>>> causing this?
>>>
>>>
>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>> Image              PC                Routine Line        Source
>>> gipaw.x            0000000000C40604  Unknown Unknown  Unknown
>>> libpthread-2.12.s  000000328DE0F7E0  Unknown Unknown  Unknown
>>> libmkl_avx2.so     00002AAAB7DA5CA3  mkl_blas_avx2_zdo Unknown  Unknown
>>>
>>> Thanks,
>>>
>>> Ben Comer
>>>
>>> Georgia Tech
>>>
>>>
>>>
>>>
>> --
>> +--------------------------------------------------------------+
>>   Davide Ceresoli
>>   CNR Institute of Molecular Science and Technology (CNR-ISTM)
>>   c/o University of Milan, via Golgi 19, 20133 Milan, Italy
>>   Email: davide.ceresoli at istm.cnr.it
>>   Phone: +39-02-50314276, +39-347-1001570 (mobile)
>>   Skype: dceresoli
>> +--------------------------------------------------------------+
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
>
> _______________________________________________
> users mailing listusers at lists.quantum-espresso.orghttps://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180807/cd519b19/attachment.html>


More information about the users mailing list