[QE-users] [SUSPECT ATTACHMENT REMOVED] Re: PWCOND: NAN/ SIGSEGV

Holzwarth, Natalie natalie at wfu.edu
Sat Sep 1 23:04:14 CEST 2018


I have chimed into the  Quantum Espresso listserve a few times noting a
similar problem characterized by an intermittent segmentation fault while
running pw.x and ph.x.     On our system which runs the Red Hat operating
system (RHEL6u9) and intel 2018 compilers we see the segmentation fault
when using  OpenMPI 3.1.1 and 3.1.0. compiled with Intel 2018.   When we
use OpenMPI 2.1.0, the problem does not appear as often.   In our case,
libpthread is always listed in the error trace.    The specific error
message that we get from a ph.x example is pasted below and the run script
and UPF are attached, just in case this is useful information.     Thanks,
Natalie

-----------error from ph.x run------------------
 Image              PC                Routine            Line
Source
ph.x               0000000000D99A1D  for__signal_handl     Unknown  Unknown
libpthread-2.12.s  0000003271E0F7E0  Unknown               Unknown  Unknown
mca_btl_vader.so   00002AB74BBB99A7  Unknown               Unknown  Unknown
libopen-pal.so.40  00002AB738AD3A54  opal_progress         Unknown  Unknown
libmpi.so.40.10.1  00002AB7384DBC04  ompi_request_defa     Unknown  Unknown
libmpi.so.40.10.1  00002AB7385384C5  ompi_coll_base_ba     Unknown  Unknown
libmpi.so.40.10.1  00002AB7384F26F1  MPI_Barrier           Unknown  Unknown
libmpi_mpifh.so.4  00002AB73826D013  MPI_Barrier_f08       Unknown  Unknown
ph.x               0000000000BA9E0E  Unknown               Unknown  Unknown
ph.x               0000000000B9835B  Unknown               Unknown  Unknown
ph.x               000000000057FE26  Unknown               Unknown  Unknown
ph.x               00000000004BE229  Unknown               Unknown  Unknown
ph.x               00000000004A0F10  Unknown               Unknown  Unknown
ph.x               0000000000415A65  Unknown               Unknown  Unknown
ph.x               000000000040EE73  Unknown               Unknown  Unknown
ph.x               000000000040EDDE  Unknown               Unknown  Unknown
libc-2.12.so       000000327161ED1D  __libc_start_main     Unknown  Unknown
ph.x               000000000040ECE9  Unknown               Unknown  Unknown
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[24484,1],12]
  Exit code:    174
--------------------------------------------------------------------------


N. A. W. Holzwarth                                       email:
natalie at wfu.edu
Department of Physics                                  web:
http://www.wfu.edu/~natalie
Wake Forest University                                 phone:
1-336-758-5510
Winston-Salem, NC 27109 USA                     office: Rm. 300 Olin
Physical Lab

On Fri, Aug 31, 2018 at 5:32 AM, Ankit Jain <ajain at fysik.dtu.dk> wrote:

> Hello Subrata,
>
> setting 'ulimit -u unlimited' does not help.
>
> Thanks,
> Ankit Jain
>
> On 31 Aug 2018, at 11.17, Subrata Jana <subrata.jana at niser.ac.in> wrote:
>
> Hi,
>
> This error was also observed when a different version of a compiler was
> loaded than that used to compile the code. Suggested was to rebuild
> everything and please try this:
>
> ftp://ftp.iitb.ac.in/LDP/en/solrhe/ch06s10.html
>
> With Regards,
> SJ
>
>
> *--------------------------------------------------------------------------------------------------------------
> *
> *SUBRATA JANA*
> *Research Scholar*
>
> *School of Physical Sciences National Institute of Science Education and
> Research (NISER), **Bhubaneswar*
> *PO- Bhimpur-Padanpur, Via- Jatni, District:- Khurda*
>
> *PIN – 752050, Odisha, INDIA*
>
> On Fri, Aug 31, 2018 at 2:14 PM, Ankit Jain <ajain at fysik.dtu.dk> wrote:
>
>> Dear All,
>>
>> I am new to PWCOND calculations and I created my input files following
>> the provided examples.
>> I am trying to do conductance calculation for Metal-conductor-metal
>> system. I am running into SIGSEGV error.
>>
>> Things I tried:
>> - running in serial vs parallel and on larger memory machines (16 cpus
>> with 128 gb memory).
>> - changing ikind in the pwcond.in input from 1 to 2 as my right and left
>> lead are same material.
>> - setting ikind =2, and bdr = 40 in the input to pwcond.x (40 is my
>> system size in the z-direction)
>> - setting ikind=2 and bdl =10 and bds = 30 in the pwcond.x input file. In
>> this case, program does not crash but returns NAN as non-zero value of
>> transmittance.
>>
>> My scf.in, pwcond.in, scf.out and pwcond.out files are attached. The
>> program (pwcond.x) dies with the following error:
>>
>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>> Image              PC                Routine            Line
>> Source
>> pwcond.x           0000000000BA019D  Unknown               Unknown
>> Unknown
>> libpthread-2.17.s  00007F841B50D6D0  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2F4595  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2F42D4  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2F5F16  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2F6215  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2F6137  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2F60EF  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2F918F  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2F8F3D  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2ED4A3  Unknown               Unknown
>> Unknown
>> libiomp5.so        00007F841A2EFD9E  Unknown               Unknown
>> Unknown
>> pwcond.x           0000000000BE1FAA  Unknown               Unknown
>> Unknown
>> pwcond.x           0000000000418405  compbs_                   439
>> compbs.f90
>> pwcond.x           0000000000425A75  do_cond_                  520
>> do_cond.f90
>> pwcond.x           000000000042096F  MAIN__                     22
>> condmain.f90
>> pwcond.x           000000000040E2EE  Unknown               Unknown
>> Unknown
>> libc-2.17.so       00007F841B153445  __libc_start_main     Unknown
>> Unknown
>> pwcond.x           000000000040E1E9  Unknown               Unknown
>> Unknown
>>
>>
>> Thank You,
>>
>> Ankit Jain
>> Postdoctroal Scholar,
>> DTU Physics,
>> Denmark.
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>
>
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180901/cfa8d6f5/attachment.html>


More information about the users mailing list