<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Yes, I had been trying to take the easy way out. Now is time to do it 'right'.</div><div><br></div><div>I had been looking through the 'understanding parallelism' section of the user manual, and was a bit confused about what is meant by some of the parameters. Below I will give my impression based on reading the manual - please correct me if I am wrong. </div><div><br></div><div>World: This is just the MPI_COMM_WORLD ... not really much to it.</div><div><br></div><div>Images (-nimage n) : for a given run, the allocated processors are divided into n loosely coupled groups of processors, each operating on a different set of data. For relaxations and MD runs, not really important or useful. </div><div><br></div><div>Pools (-npool n): the procs dedicated to each image are further subdivided into n loosely coupled groups of processors. When k point sampling is used, the kpoints are divided amongst these pools.  Within each pool, the planewaves and real space (i.e. 3D FFT) grid points are distributed amongst the processors. So for my gamma point sampling with 1 pool, the planewaves and real space grids *should* be distributed amongst all the procs (from my output this appears to be the case). But since the division of the FFT grid appears to happen plane-wise (not sure which direction though... this part doesn't seem to be mentioned),  you run into trouble if the number of procs in a pool is greater than the number of planes in the FFT grid.</div><div><br></div><div>Task groups (-ntg n): splits the procs in a given pool into n groups, each of which work independently on the 3D FFT.  </div><div><br></div><div>Orthogonalization groups (-ndiag n): a subgroup of n procs from the pool are used in the orthogonormalization or iterative subspace diagonalization of the Hamiltonian (I presume), a matrix that has dimensions of #states x #states. </div><div><br></div><div>The taskgroup bit is where I run into a good deal of confusion, because even if I set the number of taskgroups such that all processors should have planes (since I assume that the FFT grid would be distributed to each task group such that each task group was doing the same thing). My output posted at the beginning of the thread has a section header like this:</div><div><br></div><div><div>     Proc/  planes cols     G    planes cols    G      columns  G</div><div>     Pool       (dense grid)       (smooth grid)      (wavefct grid)</div><div><br></div><div>it is the first column and planes columns that now confuse me. In my test of 1 pool of 1024 procs with 32 taskgroups, the output looked like this:</div><div>Proc/  planes cols     G    planes cols    G      columns  G</div><div>     Pool       (dense grid)       (smooth grid)      (wavefct grid)</div><div>       1     15    162    50122   15    162    50122     42     6294</div><div>       2      0    162    50122    0    162    50122     42     6294</div><div>       3      0    162    50122    0    162    50122     42     6294</div><div>       4      0    162    50122    0    162    50122     42     6294</div><div>       5      0    162    50122    0    162    50122     42     6294</div><div>...</div><div>      32      0    164    50136    0    164    50136     42     6290</div><div>      33     15    164    50136   15    164    50136     42     6290</div><div>      34      0    164    50136    0    164    50136     42     6290</div><div> </div></div><div>Which seems to indicate that there are processors that aren't getting any planes and that the FFT grid is *not* being reproduced within each taskgroup. If that was the case, I would expect the 'planes' columns to be either 1 or 0 for procs #1-32, 33-64, etc. Looking at the code, this output implies that the number of planes per proc (npp) is only nonzero on a small number of processors, actually making the balancing situation worse than when ntg = 1. </div><div><br></div><div>Dave</div><div><br></div><br><div><div>On Jan 28, 2009, at 6:07 PM, Nichols A. Romero wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">David,<br><br>You should really start by making estimates of how much memory your calculation needs. To<br>do that you will really need to understand the algorithm otherwise you will just end up playing with<br>parameters forever.<br> <br>ntg is for band parallelization. You have 2560 electrons.<br><br>ntg = 32 is probably too large. <br><br>Maybe somebody on this list can make a suggestion? My experience with a real-space code is<br>the more bands per processor the better. You will probably want at least 250-500 bands per processor.<br> <br><div class="gmail_quote">On Wed, Jan 28, 2009 at 4:20 PM, Nichols A. Romero <span dir="ltr"><<a href="mailto:naromero@gmail.com">naromero@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> David,<br><br>You have the:<br>ortho sub group set to 32*32<br><br>Paolo can correct me if I am wrong. This is the Scalapack blacs grid for the<br>cholesky decomposition. It basically takes the overlap matrix whose dimensions<br> are (number of states) by (number of states) and divides into 32-by-32 pieces<br>according to a 2D block cyclic algorithm. You are using 32*32=1024 processors<br>to do the cholesky decomposition of a 2560-by-2560.<br><br> I would recommend using something like 8*8. <br><div><div></div><div class="Wj3C7c"><br><div class="gmail_quote">On Wed, Jan 28, 2009 at 4:08 PM,  <span dir="ltr"><<a href="mailto:giannozz@democritos.it" target="_blank">giannozz@democritos.it</a>></span> wrote:<br> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div>Quoting David Farrell <<a href="mailto:davidfarrell2008@u.northwestern.edu" target="_blank">davidfarrell2008@u.northwestern.edu</a>>:<br> <br> > I am trying to run a 1152 atom, 2560 electron pw MD system on a BG/P,<br> >  and I believe I am running up against memory issues<br> <br> </div>set nbnd, diago_david_ndim, mixing_ndim to the smallest possible<br> values to save memory. Use the CVS version and try to compile scalapack<br> (instructions in the wiki) if you have trouble with subspace<br> diagonalization, or else use a smaller set of processors in the "ortho<br> group": 1024 seems to me a lot for a system with O(1000) states.<br> <br> Paolo<br> <br> ----------------------------------------------------------------<br> This message was sent using IMP, the Internet Messaging Program.<br> <div><div></div><div><br> _______________________________________________<br> Pw_forum mailing list<br> <a href="mailto:Pw_forum@pwscf.org" target="_blank">Pw_forum@pwscf.org</a><br> <a href="http://www.democritos.it/mailman/listinfo/pw_forum" target="_blank">http://www.democritos.it/mailman/listinfo/pw_forum</a><br> </div></div></blockquote></div><br><br clear="all"><br></div></div><div><div></div><div class="Wj3C7c">-- <br>Nichols A. Romero, Ph.D.<br>Argonne Leadership Computing Facility<br>Argonne, IL 60490<br>(630) 252-3441 (O)<br> (630) 470-0462 (C)<br><br> </div></div></blockquote></div><br><br clear="all"><br>-- <br>Nichols A. Romero, Ph.D.<br>Argonne Leadership Computing Facility<br>Argonne, IL 60490<br>(630) 252-3441 (O)<br>(630) 470-0462 (C)<br><br> _______________________________________________<br>Pw_forum mailing list<br><a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>http://www.democritos.it/mailman/listinfo/pw_forum<br></blockquote></div><br><div apple-content-edited="true"> <span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; font-size: 12px; ">David E. Farrell</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font class="Apple-style-span" size="3"><span class="Apple-style-span" style="font-size: 12px; ">Post-Doctoral Fellow</span></font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font class="Apple-style-span" size="3"><span class="Apple-style-span" style="font-size: 12px; ">Department of Materials Science and Engineering</span></font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; font-size: 12px; ">Northwestern University</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font face="Helvetica" size="3" style="font: normal normal normal 12px/normal Helvetica; font-size: 12px; ">email: <a href="mailto:d-farrell2@northwestern.edu">d-farrell2@northwestern.edu</a></font></div></div></div></span> </div><br></body></html>