[Pw_forum] A problem of parallel excuting pwcond.x

庞瑞(PANG Rui) pang.r at sustc.edu.cn
Sun Aug 16 10:38:56 CEST 2015


Dear Paolo

Thanks for your advise. In fact I have examed line444, however, there is only a summation, and if I use other operation instead, wirte(*,*) for example, the error appeared at the same position, too. I have send a email to the cluster manager. If he find the reason, I will post it.

Sincerely

Pang Rui
 




------------------


庞瑞(PANG Rui)



South University of Science and Technology of China/Department of Physics

No.1088,Xueyuan Road, Shenzhen,Guangdong







 
 
 
------------------ Original ------------------
From:  "Paolo Giannozzi"<p.giannozzi at gmail.com>;
Date:  Sat, Aug 15, 2015 05:16 PM
To:  "PWSCF Forum"<pw_forum at pwscf.org>; 

Subject:  Re: [Pw_forum] A problem of parallel excuting pwcond.x

 
in theory, well-designed parallel code should work (or cleanly stop) on any number of processors. In practice, if you use a disproportionate number of processors, the code may not be able to deal with it. IN any case: look at what happens at line 444 of PWCOND/src/scatter_forw.f90 aqnd maybe you sill understand what is happening and why


Paolo



Paolo

On Mon, Aug 10, 2015 at 9:27 AM, 庞瑞(PANG Rui) <pang.r at sustc.edu.cn> wrote:

However, I found I could run the example correctly by using only 8 cores. If using 16, 32 and 64 cores, the code will stop at 
" ngper, shell number =          804          82
 ngper, n2d =          804         391
---  E-Ef =    2.0000000  k =    0.5000000   0.5000000
---  ie =          1  ik =          1" 

and showed


"forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source       
      
pwcond.x           00000000025AD7C9  Unknown               Unknown  Unknown
pwcond.x           00000000025AC140  Unknown               Unknown  Unknown
pwcond.x           000000000255DBC2  Unknown               Unknown  Unknown
pwcond.x           00000000024F15F3  Unknown               Unknown  Unknown
pwcond.x           00000000024F753B  Unknown               Unknown  Unknown
libpthread.so.0    0000003F2280F710  Unknown               Unknown  Unknown
pwcond.x           00000000004F9B35  scatter_forw_             444  scatter_forw.
f90
pwcond.x           00000000004ADD66  do_cond_                  518  do_cond.f90
pwcond.x           00000000004A8B42  MAIN__                     22  condmain.f90
pwcond.x           0000000000498036  Unknown               Unknown  Unknown
libc.so.6          0000003F2241ED1D  Unknown               Unknown  Unknown
pwcond.x           0000000000497F29  Unknown               Unknown  Unknown"

So could anyone tell me how can I  fix this error so that I can using more cores to excute pwcond.x? I met this problem in both 5.1.2 and 5.2.

Thanks very much for any help.

Sincerely.

Pang Rui

 





------------------


庞瑞(PANG Rui)



South University of Science and Technology of China/Department of Physics

No.1088,Xueyuan Road, Shenzhen,Guangdong







 


_______________________________________________
 Pw_forum mailing list
 Pw_forum at pwscf.org
 http://pwscf.org/mailman/listinfo/pw_forum




-- 
Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
 Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
 Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20150816/3e223b1c/attachment.html>


More information about the users mailing list