[Pw_forum] PW taskgroups and a large run on a BG/P

David Farrell davidfarrell2008 at u.northwestern.edu
Thu Jan 29 17:21:47 CET 2009


Yes, I had been trying to take the easy way out. Now is time to do it  
'right'.

I had been looking through the 'understanding parallelism' section of  
the user manual, and was a bit confused about what is meant by some of  
the parameters. Below I will give my impression based on reading the  
manual - please correct me if I am wrong.

World: This is just the MPI_COMM_WORLD ... not really much to it.

Images (-nimage n) : for a given run, the allocated processors are  
divided into n loosely coupled groups of processors, each operating on  
a different set of data. For relaxations and MD runs, not really  
important or useful.

Pools (-npool n): the procs dedicated to each image are further  
subdivided into n loosely coupled groups of processors. When k point  
sampling is used, the kpoints are divided amongst these pools.  Within  
each pool, the planewaves and real space (i.e. 3D FFT) grid points are  
distributed amongst the processors. So for my gamma point sampling  
with 1 pool, the planewaves and real space grids *should* be  
distributed amongst all the procs (from my output this appears to be  
the case). But since the division of the FFT grid appears to happen  
plane-wise (not sure which direction though... this part doesn't seem  
to be mentioned),  you run into trouble if the number of procs in a  
pool is greater than the number of planes in the FFT grid.

Task groups (-ntg n): splits the procs in a given pool into n groups,  
each of which work independently on the 3D FFT.

Orthogonalization groups (-ndiag n): a subgroup of n procs from the  
pool are used in the orthogonormalization or iterative subspace  
diagonalization of the Hamiltonian (I presume), a matrix that has  
dimensions of #states x #states.

The taskgroup bit is where I run into a good deal of confusion,  
because even if I set the number of taskgroups such that all  
processors should have planes (since I assume that the FFT grid would  
be distributed to each task group such that each task group was doing  
the same thing). My output posted at the beginning of the thread has a  
section header like this:

      Proc/  planes cols     G    planes cols    G      columns  G
      Pool       (dense grid)       (smooth grid)      (wavefct grid)

it is the first column and planes columns that now confuse me. In my  
test of 1 pool of 1024 procs with 32 taskgroups, the output looked  
like this:
Proc/  planes cols     G    planes cols    G      columns  G
      Pool       (dense grid)       (smooth grid)      (wavefct grid)
        1     15    162    50122   15    162    50122     42     6294
        2      0    162    50122    0    162    50122     42     6294
        3      0    162    50122    0    162    50122     42     6294
        4      0    162    50122    0    162    50122     42     6294
        5      0    162    50122    0    162    50122     42     6294
...
       32      0    164    50136    0    164    50136     42     6290
       33     15    164    50136   15    164    50136     42     6290
       34      0    164    50136    0    164    50136     42     6290

Which seems to indicate that there are processors that aren't getting  
any planes and that the FFT grid is *not* being reproduced within each  
taskgroup. If that was the case, I would expect the 'planes' columns  
to be either 1 or 0 for procs #1-32, 33-64, etc. Looking at the code,  
this output implies that the number of planes per proc (npp) is only  
nonzero on a small number of processors, actually making the balancing  
situation worse than when ntg = 1.

Dave


On Jan 28, 2009, at 6:07 PM, Nichols A. Romero wrote:

> David,
>
> You should really start by making estimates of how much memory your  
> calculation needs. To
> do that you will really need to understand the algorithm otherwise  
> you will just end up playing with
> parameters forever.
>
> ntg is for band parallelization. You have 2560 electrons.
>
> ntg = 32 is probably too large.
>
> Maybe somebody on this list can make a suggestion? My experience  
> with a real-space code is
> the more bands per processor the better. You will probably want at  
> least 250-500 bands per processor.
>
> On Wed, Jan 28, 2009 at 4:20 PM, Nichols A. Romero  
> <naromero at gmail.com> wrote:
> David,
>
> You have the:
> ortho sub group set to 32*32
>
> Paolo can correct me if I am wrong. This is the Scalapack blacs grid  
> for the
> cholesky decomposition. It basically takes the overlap matrix whose  
> dimensions
> are (number of states) by (number of states) and divides into 32- 
> by-32 pieces
> according to a 2D block cyclic algorithm. You are using 32*32=1024  
> processors
> to do the cholesky decomposition of a 2560-by-2560.
>
> I would recommend using something like 8*8.
>
> On Wed, Jan 28, 2009 at 4:08 PM, <giannozz at democritos.it> wrote:
> Quoting David Farrell <davidfarrell2008 at u.northwestern.edu>:
>
> > I am trying to run a 1152 atom, 2560 electron pw MD system on a BG/ 
> P,
> >  and I believe I am running up against memory issues
>
> set nbnd, diago_david_ndim, mixing_ndim to the smallest possible
> values to save memory. Use the CVS version and try to compile  
> scalapack
> (instructions in the wiki) if you have trouble with subspace
> diagonalization, or else use a smaller set of processors in the "ortho
> group": 1024 seems to me a lot for a system with O(1000) states.
>
> Paolo
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
>
>
>
> -- 
> Nichols A. Romero, Ph.D.
> Argonne Leadership Computing Facility
> Argonne, IL 60490
> (630) 252-3441 (O)
> (630) 470-0462 (C)
>
>
>
>
> -- 
> Nichols A. Romero, Ph.D.
> Argonne Leadership Computing Facility
> Argonne, IL 60490
> (630) 252-3441 (O)
> (630) 470-0462 (C)
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum

David E. Farrell
Post-Doctoral Fellow
Department of Materials Science and Engineering
Northwestern University
email: d-farrell2 at northwestern.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20090129/416eb694/attachment.html>


More information about the users mailing list