[QE-users] GIPAW Segmentation Fault
Holzwarth, Natalie
natalie at wfu.edu
Tue Aug 7 18:51:23 CEST 2018
I am not sure what is the output for that particular line, but the last
part of the output looks as given below. For different instances of the
run, the particular iteration changes, but the boxed error message returned
by phonon.f90 is always the same. We have seen the error in both 6.2.1
and 6.3 versions of the QE package. If this is of interest, I will be glad
to send any additional information. The run is for NaCl. Thanks,
Natalie
------------- end of phonon output file:
Representation # 3 mode # 3
Self-consistent Calculation
iter # 1 total cpu time : 1573.2 secs av.it.: 6.9
thresh= 1.000E-02 alpha_mix = 0.700 |ddv_scf|^2 = 1.571E-05
iter # 2 total cpu time : 1602.0 secs av.it.: 11.4
thresh= 3.964E-04 alpha_mix = 0.700 |ddv_scf|^2 = 1.095E-05
iter # 3 total cpu time : 1629.0 secs av.it.: 10.6
thresh= 3.308E-04 alpha_mix = 0.700 |ddv_scf|^2 = 3.191E-08
iter # 4 total cpu time : 1657.8 secs av.it.: 11.2
thresh= 1.786E-05 alpha_mix = 0.700 |ddv_scf|^2 = 2.458E-10
iter # 5 total cpu time : 1687.6 secs av.it.: 11.9
thresh= 1.568E-06 alpha_mix = 0.700 |ddv_scf|^2 = 2.328E-11
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
N. A. W. Holzwarth email:
natalie at wfu.edu
Department of Physics web:
http://www.wfu.edu/~natalie
Wake Forest University phone:
1-336-758-5510
Winston-Salem, NC 27109 USA office: Rm. 300 Olin
Physical Lab
On Tue, Aug 7, 2018 at 12:00 PM, Pietro Delugas <pdelugas at sissa.it> wrote:
> hello
>
> what is the output of
>
> addr2line -p -e ph.x 00000000004BE229
> and what version of ph are you using ?
>
>
> On 07/08/2018 17:06, Holzwarth, Natalie wrote:
>
> This segmentation fault issue has also appeared for us in another QE
> code. Perhaps it is a totally unrelated problem which we find related to
> the openmpi package compiled with intel-3.1.1-2018 and intel-3.1.0-2018.
> In our case, compiling with openmpi package compiled with intel-2.1.0-2018
> usually (not always) solves the problem. Compiling with a much older
> openmpi package solves the problem, but is not viable for the current
> configuration of our cluster. Since no one else mentioned the problem until
> now, we think it may have to do with our internal setup??? The error is
> very intermittent, occurring at different places in the code for the same
> input. The common aspect of this error to the one in the original
> message is libpthread-2.12.so. Most reliably we see the error in ph.x
> with part of the error message:
>
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line
> Source
> ph.x 0000000000D99A1D for__signal_handl Unknown Unknown
> libpthread-2.12.s 0000003271E0F7E0 Unknown Unknown Unknown
> mca_btl_vader.so 00002AB74BBB99A7 Unknown Unknown Unknown
> libopen-pal.so.40 00002AB738AD3A54 opal_progress Unknown Unknown
> libmpi.so.40.10.1 00002AB7384DBC04 ompi_request_defa Unknown Unknown
> libmpi.so.40.10.1 00002AB7385384C5 ompi_coll_base_ba Unknown Unknown
> libmpi.so.40.10.1 00002AB7384F26F1 MPI_Barrier Unknown Unknown
> libmpi_mpifh.so.4 00002AB73826D013 MPI_Barrier_f08 Unknown Unknown
> ph.x 0000000000BA9E0E Unknown Unknown Unknown
> ph.x 0000000000B9835B Unknown Unknown Unknown
> ph.x 000000000057FE26 Unknown Unknown Unknown
> ph.x 00000000004BE229 Unknown Unknown Unknown
> ph.x 00000000004A0F10 Unknown Unknown Unknown
> ph.x 0000000000415A65 Unknown Unknown Unknown
> ph.x 000000000040EE73 Unknown Unknown Unknown
> ph.x 000000000040EDDE Unknown Unknown Unknown
> libc-2.12.so 000000327161ED1D __libc_start_main Unknown
> Unknown
> ph.x 000000000040ECE9 Unknown Unknown Unknown
>
> I am very curious about whether you think this may be related or totally
> unrelated. Thanks, Natalie Holzwarth
>
> N. A. W. Holzwarth email:
> natalie at wfu.edu
> Department of Physics web:
> http://www.wfu.edu/~natalie
> Wake Forest University phone:
> 1-336-758-5510
> Winston-Salem, NC 27109 USA office: Rm. 300 Olin
> Physical Lab
>
> On Tue, Aug 7, 2018 at 8:18 AM, Davide Ceresoli <davide.ceresoli at cnr.it>
> wrote:
>
>> Dear Ben,
>> I'm afraid it's a problem with MKL-blas ZDOTC, which must
>> return a complex(dp) result. Very strange, because if you grep
>> the source code, we have declared it: complex(dp), external::zdtoc
>>
>> Can you tell us your compiler and MKL version? can you add
>> DFLAGS+=-Dzdotc=zdotc_wrapper
>> to the QE make.inc and recompile both (QE and GIPAW)?
>>
>> Best wishes,
>> Davide
>>
>>
>> On 08/06/2018 03:19 PM, Ben Comer wrote:
>>
>>> Hello,
>>>
>>> I've been trying to do g factor calculations in GIPAW working for a few
>>> days now. I keep getting a segmentation fault (below) no matter how I
>>> compile it on our cluster. Does anyone have a good idea of what might be
>>> causing this?
>>>
>>>
>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>> Image PC Routine Line Source
>>> gipaw.x 0000000000C40604 Unknown Unknown Unknown
>>> libpthread-2.12.s 000000328DE0F7E0 Unknown Unknown Unknown
>>> libmkl_avx2.so 00002AAAB7DA5CA3 mkl_blas_avx2_zdo Unknown Unknown
>>>
>>> Thanks,
>>>
>>> Ben Comer
>>>
>>> Georgia Tech
>>>
>>>
>>>
>>>
>> --
>> +--------------------------------------------------------------+
>> Davide Ceresoli
>> CNR Institute of Molecular Science and Technology (CNR-ISTM)
>> c/o University of Milan, via Golgi 19, 20133 Milan, Italy
>> Email: davide.ceresoli at istm.cnr.it
>> Phone: +39-02-50314276, +39-347-1001570 (mobile)
>> Skype: dceresoli
>> +--------------------------------------------------------------+
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
>
> _______________________________________________
> users mailing listusers at lists.quantum-espresso.orghttps://lists.quantum-espresso.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180807/cd519b19/attachment.html>
More information about the users
mailing list