[Pw_forum] segmentation fault
Iurii Timrov
itimrov at sissa.it
Mon Aug 17 12:38:39 CEST 2015
Dear Vishal Gupta,
Which version of Quantum ESPRESSO do you use?
Do you use a SVN version downloaded later than July, 6th 2015 (revision
> r11608)? If yes, then you may have the same problem as I do. My problem is due to the commit on July, 6th (r11608), and it occurs when I run QE on FERMI @CINECA (BlueGene/Q Architecture, Italian HPC) with 2048 cores (there is no problem with 1024 cores).
http://qe-forge.org/gf/project/q-e/scmsvn/?action=browse&path=%2Ftrunk%2Fespresso%2FModules%2Fmp_world.f90&r1=11607&r2=11608
At the very beginning of the run, there is a message:
4466173:ibm.runjob.client.Job: terminated by signal 11
4466173:ibm.runjob.client.Job: abnormal termination by signal 11 from
rank 2031
and the code crashes without producing any output. However, the problem
didn't occur on other HPC's I use.
I have solved the problem by going back to a revision 11607, which
implied changes in the routine Modules/mp_world.f90 by going back from
CALL MPI_Init_thread(MPI_THREAD_MULTIPLE, PROVIDED, ierr)
to
CALL mpi_init_thread(MPI_THREAD_FUNNELED,PROVIDED,ierr)
You may also try to do all needed changes in mp_world.f90 and test the
code again.
HTH
Best regards,
Iurii Timrov
Postdoctoral Researcher
SISSA - International School for Advanced Studies
Condensed Matter Sector
Via Bonomea n. 265,
Trieste 34151, Italy
On 2015-08-14 19:32, Axel Kohlmeyer wrote:
> On Fri, Aug 14, 2015 at 1:20 PM, Vishal Gupta
> <vishal.gupta at iitrpr.ac.in> wrote:
>> Sorry, I should've mentioned.
>> I asked them but they said there might be something wrong with the QE
>> input
>> file. If that was the case, the file shouldn't have been running fine
>> with 7
>> processors but it is. Could there really be something wrong with the
>> input
>> file ?
>
> sysadmins often say this, so they don't have to check it out, or when
> they don't know what they are doing. if they *know* that there is
> something wrong with the input, then they should at the very least
> tell you what it is.
>
> but i agree that if it works with less processors, it should work with
> more. unless you are using some very unusual settings when launching
> the job. more likely is that you are running out of memory on the
> machine or are hitting a stack size limit or something similar. your
> system manager(s) should be able to figure this out and/or advise you
> how to run that you are using less memory, or with a hybrid MPI plus
> OpenMP parallelization or whatever else is possible on the specific
> machine.
>
> in any case, it doesn't really sound like a QE problem.
>
> axel.
>
>
>> Sorry if I am asking stupid doubts but I am little new at this.
>> Vishal Gupta
>> B.Tech. 3rd year Mechanical
>> Indian Institute of Technology Ropar
>> Rupnagar (140001), Punjab, India.
>> Email :- vishal.gupta at iitrpr.ac.in
>>
>> On Fri, Aug 14, 2015 at 10:32 PM, Axel Kohlmeyer <akohlmey at gmail.com>
>> wrote:
>>>
>>> On Fri, Aug 14, 2015 at 12:58 PM, Vishal Gupta
>>> <vishal.gupta at iitrpr.ac.in> wrote:
>>> > I've been running an SCF calculation for a fee Ni system on High
>>> > performance
>>> > cluster. The job runs fine with processors 7 or less but it always leads
>>> > to
>>> > segmentation fault if the no of processors exceeds 7.
>>> > The job takes 4-5 days for the run.
>>> > Is there any way to increase the no of processors so that it doesn't
>>> > lead to
>>> > the error ?
>>> > mpirun noticed that process rank 0 with PID 6353 on node c7c exited on
>>> > signal 11 (Segmentation fault).
>>> > or excessive memory leakage.
>>>
>>> that is really a question your should ask the system manager(s) or
>>> user support people of the machine that you are running on.
>>>
>>> axel.
>>>
>>>
>>> >
>>> > Thank You
>>> > Vishal Gupta
>>> > B.Tech. 3rd year Mechanical
>>> > Indian Institute of Technology Ropar
>>> > Rupnagar (140001), Punjab, India.
>>> > Email :- vishal.gupta at iitrpr.ac.in
>>> >
>>> > _______________________________________________
>>> > Pw_forum mailing list
>>> > Pw_forum at pwscf.org
>>> > http://pwscf.org/mailman/listinfo/pw_forum
>>>
>>>
>>>
>>> --
>>> Dr. Axel Kohlmeyer akohlmey at gmail.com http://goo.gl/1wk0
>>> College of Science & Technology, Temple University, Philadelphia PA,
>>> USA
>>> International Centre for Theoretical Physics, Trieste. Italy.
>>> _______________________________________________
>>> Pw_forum mailing list
>>> Pw_forum at pwscf.org
>>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
More information about the users
mailing list