[Pw_forum] A problem of parallel excuting pwcond.x
庞瑞(PANG Rui)
pang.r at sustc.edu.cn
Sun Aug 16 10:38:56 CEST 2015
Dear Paolo
Thanks for your advise. In fact I have examed line444, however, there is only a summation, and if I use other operation instead, wirte(*,*) for example, the error appeared at the same position, too. I have send a email to the cluster manager. If he find the reason, I will post it.
Sincerely
Pang Rui
------------------
庞瑞(PANG Rui)
South University of Science and Technology of China/Department of Physics
No.1088,Xueyuan Road, Shenzhen,Guangdong
------------------ Original ------------------
From: "Paolo Giannozzi"<p.giannozzi at gmail.com>;
Date: Sat, Aug 15, 2015 05:16 PM
To: "PWSCF Forum"<pw_forum at pwscf.org>;
Subject: Re: [Pw_forum] A problem of parallel excuting pwcond.x
in theory, well-designed parallel code should work (or cleanly stop) on any number of processors. In practice, if you use a disproportionate number of processors, the code may not be able to deal with it. IN any case: look at what happens at line 444 of PWCOND/src/scatter_forw.f90 aqnd maybe you sill understand what is happening and why
Paolo
Paolo
On Mon, Aug 10, 2015 at 9:27 AM, 庞瑞(PANG Rui) <pang.r at sustc.edu.cn> wrote:
However, I found I could run the example correctly by using only 8 cores. If using 16, 32 and 64 cores, the code will stop at
" ngper, shell number = 804 82
ngper, n2d = 804 391
--- E-Ef = 2.0000000 k = 0.5000000 0.5000000
--- ie = 1 ik = 1"
and showed
"forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
pwcond.x 00000000025AD7C9 Unknown Unknown Unknown
pwcond.x 00000000025AC140 Unknown Unknown Unknown
pwcond.x 000000000255DBC2 Unknown Unknown Unknown
pwcond.x 00000000024F15F3 Unknown Unknown Unknown
pwcond.x 00000000024F753B Unknown Unknown Unknown
libpthread.so.0 0000003F2280F710 Unknown Unknown Unknown
pwcond.x 00000000004F9B35 scatter_forw_ 444 scatter_forw.
f90
pwcond.x 00000000004ADD66 do_cond_ 518 do_cond.f90
pwcond.x 00000000004A8B42 MAIN__ 22 condmain.f90
pwcond.x 0000000000498036 Unknown Unknown Unknown
libc.so.6 0000003F2241ED1D Unknown Unknown Unknown
pwcond.x 0000000000497F29 Unknown Unknown Unknown"
So could anyone tell me how can I fix this error so that I can using more cores to excute pwcond.x? I met this problem in both 5.1.2 and 5.2.
Thanks very much for any help.
Sincerely.
Pang Rui
------------------
庞瑞(PANG Rui)
South University of Science and Technology of China/Department of Physics
No.1088,Xueyuan Road, Shenzhen,Guangdong
_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum
--
Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20150816/3e223b1c/attachment.html>
More information about the users
mailing list