[QE-users] [SUSPECT ATTACHMENT REMOVED] Re: PWCOND: NAN/ SIGSEGV
Holzwarth, Natalie
natalie at wfu.edu
Sat Sep 1 23:04:14 CEST 2018
I have chimed into the Quantum Espresso listserve a few times noting a
similar problem characterized by an intermittent segmentation fault while
running pw.x and ph.x. On our system which runs the Red Hat operating
system (RHEL6u9) and intel 2018 compilers we see the segmentation fault
when using OpenMPI 3.1.1 and 3.1.0. compiled with Intel 2018. When we
use OpenMPI 2.1.0, the problem does not appear as often. In our case,
libpthread is always listed in the error trace. The specific error
message that we get from a ph.x example is pasted below and the run script
and UPF are attached, just in case this is useful information. Thanks,
Natalie
-----------error from ph.x run------------------
Image PC Routine Line
Source
ph.x 0000000000D99A1D for__signal_handl Unknown Unknown
libpthread-2.12.s 0000003271E0F7E0 Unknown Unknown Unknown
mca_btl_vader.so 00002AB74BBB99A7 Unknown Unknown Unknown
libopen-pal.so.40 00002AB738AD3A54 opal_progress Unknown Unknown
libmpi.so.40.10.1 00002AB7384DBC04 ompi_request_defa Unknown Unknown
libmpi.so.40.10.1 00002AB7385384C5 ompi_coll_base_ba Unknown Unknown
libmpi.so.40.10.1 00002AB7384F26F1 MPI_Barrier Unknown Unknown
libmpi_mpifh.so.4 00002AB73826D013 MPI_Barrier_f08 Unknown Unknown
ph.x 0000000000BA9E0E Unknown Unknown Unknown
ph.x 0000000000B9835B Unknown Unknown Unknown
ph.x 000000000057FE26 Unknown Unknown Unknown
ph.x 00000000004BE229 Unknown Unknown Unknown
ph.x 00000000004A0F10 Unknown Unknown Unknown
ph.x 0000000000415A65 Unknown Unknown Unknown
ph.x 000000000040EE73 Unknown Unknown Unknown
ph.x 000000000040EDDE Unknown Unknown Unknown
libc-2.12.so 000000327161ED1D __libc_start_main Unknown Unknown
ph.x 000000000040ECE9 Unknown Unknown Unknown
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[24484,1],12]
Exit code: 174
--------------------------------------------------------------------------
N. A. W. Holzwarth email:
natalie at wfu.edu
Department of Physics web:
http://www.wfu.edu/~natalie
Wake Forest University phone:
1-336-758-5510
Winston-Salem, NC 27109 USA office: Rm. 300 Olin
Physical Lab
On Fri, Aug 31, 2018 at 5:32 AM, Ankit Jain <ajain at fysik.dtu.dk> wrote:
> Hello Subrata,
>
> setting 'ulimit -u unlimited' does not help.
>
> Thanks,
> Ankit Jain
>
> On 31 Aug 2018, at 11.17, Subrata Jana <subrata.jana at niser.ac.in> wrote:
>
> Hi,
>
> This error was also observed when a different version of a compiler was
> loaded than that used to compile the code. Suggested was to rebuild
> everything and please try this:
>
> ftp://ftp.iitb.ac.in/LDP/en/solrhe/ch06s10.html
>
> With Regards,
> SJ
>
>
> *--------------------------------------------------------------------------------------------------------------
> *
> *SUBRATA JANA*
> *Research Scholar*
>
> *School of Physical Sciences National Institute of Science Education and
> Research (NISER), **Bhubaneswar*
> *PO- Bhimpur-Padanpur, Via- Jatni, District:- Khurda*
>
> *PIN – 752050, Odisha, INDIA*
>
> On Fri, Aug 31, 2018 at 2:14 PM, Ankit Jain <ajain at fysik.dtu.dk> wrote:
>
>> Dear All,
>>
>> I am new to PWCOND calculations and I created my input files following
>> the provided examples.
>> I am trying to do conductance calculation for Metal-conductor-metal
>> system. I am running into SIGSEGV error.
>>
>> Things I tried:
>> - running in serial vs parallel and on larger memory machines (16 cpus
>> with 128 gb memory).
>> - changing ikind in the pwcond.in input from 1 to 2 as my right and left
>> lead are same material.
>> - setting ikind =2, and bdr = 40 in the input to pwcond.x (40 is my
>> system size in the z-direction)
>> - setting ikind=2 and bdl =10 and bds = 30 in the pwcond.x input file. In
>> this case, program does not crash but returns NAN as non-zero value of
>> transmittance.
>>
>> My scf.in, pwcond.in, scf.out and pwcond.out files are attached. The
>> program (pwcond.x) dies with the following error:
>>
>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>> Image PC Routine Line
>> Source
>> pwcond.x 0000000000BA019D Unknown Unknown
>> Unknown
>> libpthread-2.17.s 00007F841B50D6D0 Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2F4595 Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2F42D4 Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2F5F16 Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2F6215 Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2F6137 Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2F60EF Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2F918F Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2F8F3D Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2ED4A3 Unknown Unknown
>> Unknown
>> libiomp5.so 00007F841A2EFD9E Unknown Unknown
>> Unknown
>> pwcond.x 0000000000BE1FAA Unknown Unknown
>> Unknown
>> pwcond.x 0000000000418405 compbs_ 439
>> compbs.f90
>> pwcond.x 0000000000425A75 do_cond_ 520
>> do_cond.f90
>> pwcond.x 000000000042096F MAIN__ 22
>> condmain.f90
>> pwcond.x 000000000040E2EE Unknown Unknown
>> Unknown
>> libc-2.17.so 00007F841B153445 __libc_start_main Unknown
>> Unknown
>> pwcond.x 000000000040E1E9 Unknown Unknown
>> Unknown
>>
>>
>> Thank You,
>>
>> Ankit Jain
>> Postdoctroal Scholar,
>> DTU Physics,
>> Denmark.
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users at lists.quantum-espresso.org
>> https://lists.quantum-espresso.org/mailman/listinfo/users
>>
>
>
>
> _______________________________________________
> users mailing list
> users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20180901/cfa8d6f5/attachment.html>
More information about the users
mailing list