[QE-users] [SUSPECT ATTACHMENT REMOVED] Re: PWCOND: NAN/ SIGSEGV

Paolo Giannozzi p.giannozzi at gmail.com
Mon Sep 3 21:27:46 CEST 2018


I don't what may cause this kind of random MPI errors, but those of the
original post seem to have a less mysterious origin.  A compilation with
bound check (-CB for Intel fortran) yields

forrtl: severe (408): fort: (2): Subscript #1 of the array KVAL1 has value
167 which is greater than the upper bound of 166

Image              PC                Routine            Line
Source
pwcond.x           000000000046E942  jbloch_                   162
jbloch.f90
pwcond.x           0000000000416D57  compbs_                   244
compbs.f90
pwcond.x           00000000004328AA  do_cond_                  520
do_cond.f90
pwcond.x           000000000042AB26  MAIN__                     22
condmain.f90

This is clearly a programming issue

Paolo

On Sat, Sep 1, 2018 at 11:04 PM, Holzwarth, Natalie <natalie at wfu.edu> wrote:

> I have chimed into the  Quantum Espresso listserve a few times noting a
> similar problem characterized by an intermittent segmentation fault while
> running pw.x and ph.x.     On our system which runs the Red Hat operating
> system (RHEL6u9) and intel 2018 compilers we see the segmentation fault
> when using  OpenMPI 3.1.1 and 3.1.0. compiled with Intel 2018.   When we
> use OpenMPI 2.1.0, the problem does not appear as often.   In our case,
> libpthread is always listed in the error trace.    The specific error
> message that we get from a ph.x example is pasted below and the run script
> and UPF are attached, just in case this is useful information.     Thanks,
> Natalie
>
> -----------error from ph.x run------------------
>  Image              PC                Routine            Line
> Source
> ph.x               0000000000D99A1D  for__signal_handl     Unknown  Unknown
> libpthread-2.12.s  0000003271E0F7E0  Unknown               Unknown  Unknown
> mca_btl_vader.so   00002AB74BBB99A7  Unknown               Unknown  Unknown
> libopen-pal.so.40  00002AB738AD3A54  opal_progress         Unknown  Unknown
> libmpi.so.40.10.1  00002AB7384DBC04  ompi_request_defa     Unknown  Unknown
> libmpi.so.40.10.1  00002AB7385384C5  ompi_coll_base_ba     Unknown  Unknown
> libmpi.so.40.10.1  00002AB7384F26F1  MPI_Barrier           Unknown  Unknown
> libmpi_mpifh.so.4  00002AB73826D013  MPI_Barrier_f08       Unknown  Unknown
> ph.x               0000000000BA9E0E  Unknown               Unknown  Unknown
> ph.x               0000000000B9835B  Unknown               Unknown  Unknown
> ph.x               000000000057FE26  Unknown               Unknown  Unknown
> ph.x               00000000004BE229  Unknown               Unknown  Unknown
> ph.x               00000000004A0F10  Unknown               Unknown  Unknown
> ph.x               0000000000415A65  Unknown               Unknown  Unknown
> ph.x               000000000040EE73  Unknown               Unknown  Unknown
> ph.x               000000000040EDDE  Unknown               Unknown  Unknown
> libc-2.12.so       000000327161ED1D  __libc_start_main     Unknown
> Unknown
> ph.x               000000000040ECE9  Unknown               Unknown  Unknown
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status,
> thus causing
> the job to be terminated. The first process to do so was:
>
>   Process name: [[24484,1],12]
>   Exit code:    174
> --------------------------------------------------------------------------
>
>
> N. A. W. Holzwarth                                       email:
> natalie at wfu.edu
> Department of Physics                                  web:
> http://www.wfu.edu/~natalie
> Wake Forest University                                 phone:
> 1-336-758-5510
> Winston-Salem, NC 27109 USA                     office: Rm. 300 Olin
> Physical Lab
>
> On Fri, Aug 31, 2018 at 5:32 AM, Ankit Jain <ajain at fysik.dtu.dk> wrote:
>
>> Hello Subrata,
>>
>> setting 'ulimit -u unlimited' does not help.
>>
>> Thanks,
>> Ankit Jain
>>
>> On 31 Aug 2018, at 11.17, Subrata Jana <subrata.jana at niser.ac.in> wrote:
>>
>> Hi,
>>
>> This error was also observed when a different version of a compiler was
>> loaded than that used to compile the code. Suggested was to rebuild
>> everything and please try this:
>>
>> ftp://ftp.iitb.ac.in/LDP/en/solrhe/ch06s10.html
>>
>> With Regards,
>> SJ
>>
>>
>> *--------------------------------------------------------------------------------------------------------------
>> *
>> *SUBRATA JANA*
>> *Research Scholar*
>>
>> *School of Physical Sciences National Institute of Science Education and
>> Research (NISER), **Bhubaneswar*
>> *PO- Bhimpur-Padanpur, Via- Jatni, District:- Khurda*
>>
>> *PIN – 752050, Odisha, INDIA*
>>
>> On Fri, Aug 31, 2018 at 2:14 PM, Ankit Jain <ajain at fysik.dtu.dk> wrote:
>>
>>> Dear All,
>>>
>>> I am new to PWCOND calculations and I created my input files following
>>> the provided examples.
>>> I am trying to do conductance calculation for Metal-conductor-metal
>>> system. I am running into SIGSEGV error.
>>>
>>> Things I tried:
>>> - running in serial vs parallel and on larger memory machines (16 cpus
>>> with 128 gb memory).
>>> - changing ikind in the pwcond.in input from 1 to 2 as my right and
>>> left lead are same material.
>>> - setting ikind =2, and bdr = 40 in the input to pwcond.x (40 is my
>>> system size in the z-direction)
>>> - setting ikind=2 and bdl =10 and bds = 30 in the pwcond.x input file.
>>> In this case, program does not crash but returns NAN as non-zero value of
>>> transmittance.
>>>
>>> My scf.in, pwcond.in, scf.out and pwcond.out files are attached. The
>>> program (pwcond.x) dies with the following error:
>>>
>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>> Image              PC                Routine            Line
>>> Source
>>> pwcond.x           0000000000BA019D  Unknown               Unknown
>>> Unknown
>>> libpthread-2.17.s  00007F841B50D6D0  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2F4595  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2F42D4  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2F5F16  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2F6215  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2F6137  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2F60EF  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2F918F  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2F8F3D  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2ED4A3  Unknown               Unknown
>>> Unknown
>>> libiomp5.so        00007F841A2EFD9E  Unknown               Unknown
>>> Unknown
>>> pwcond.x           0000000000BE1FAA  Unknown               Unknown
>>> Unknown
>>> pwcond.x           0000000000418405  compbs_                   439
>>> compbs.f90
>>> pwcond.x           0000000000425A75  do_cond_                  520
>>> do_cond.f90
>>> pwcond.x           000000000042096F  MAIN__                     22
>>> condmain.f90
>>> pwcond.x           000000000040E2EE  Unknown               Unknown
>>> Unknown
>>> libc-2.17.so       00007F841B153445  __libc_start_main     Unknown
>>> Unknown
>>> pwcond.x           000000000040E1E9  Unknown               Unknown
>>> Unknown
>>>
>>>
>>> Thank You,
>>>
>>> Ankit Jain
>>> Postdoctroal Scholar,
>>> DTU Physics,
>>> Denmark.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users at lists.quantum-espresso.org
>>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180903/a5021820/attachment.html>


More information about the users mailing list