[Pw_forum] PW very sensitive to number of processors and pools
Christopher O'Brien
cjobrien at ncsu.edu
Sun Jun 27 20:54:56 CEST 2010
I am trying to do benchmarks and determine how best to parallelize pw.x for scf calculations using a toy system consisting of a single Cu unit cell. However it seems as if there are 'magic' combinations of processors and pools that determine if a run will work. Is there a pattern to choosing these parameters?
As an example, I used ewcut=35Ry, ecutrho=350Ry and a monkhorst-pack k-points grid. I have an older beowulf cluster with 2 single-core procs per node and 1GB RAM each.
As an example of the sensitivity two cases are shown:
kpoints nprocs npool nr3 nr3s npp npps
Working 10x10x10 20 10 30 20 15 12
Working 20x20x20 32 2 30 20 2 2
Failing 30x30x30 20 2 30 20 4 3
Failing 30x30x30 64 8 30 20 4 3
As you can tell I am randomly guessing here. Even the 10x10x10 case took a lot of guessing to get it to work.
All of the crashes occur in one of two places: 1.) During writing of the k-points, or 2.) while writing the band energies at each k-point.
Case 1)
> k( 282) = ( 0.7833333 -0.0500000 0.8166667), wk = 0.0008889
> k( 283) = ( 0.7500000 -0.0166667 0.7833333), wk = 0.0008889
> p1_8011: p4_error: net_recv read: probable EOF on socket: 1
> p7_25354: p4_error: net_recv read: probable EOF on socket: 1
> p19_22825: p4_error: net_recv read: probable EOF on socket: 1
Case 2)
> k = 0.4500-0.4500 1.3500 ( 271 PWs) bands (ev):
>
> 7.6764 9.3915 9.6220 10.1803 10.9132 12.6419 16.5014 30.5475
> 33.0769 33.8058
Thanks in advance.
===================================================================
Christopher J. O'Brien
cjobrien at ncsu.edu
Ph.D. Candidate
Computational Materials Group
Department of Materials Science & Engineering
North Carolina State University
__________________________________________________________________
Please send all documents in PDF, HTML, RTF, DVI, PS or plain text.
For Word documents: Please use the 'Save as PDF' option before sending.
===================================================================
More information about the users
mailing list