[Pw_forum] PW taskgroups and a large run on a BG/P

Wed Jan 28 20:46:12 CET 2009

I am trying to run a 1152 atom, 2560 electron pw MD system on a BG/P,  
and I believe I am running up against memory issues (not a  
surprise...)- but I am not exactly sure how to debug & solve the  
issue. I am trying to run on 1024 procs (I've tried in smp, dual and  
vn mode), though I guess I may have to go higher - but I am not  
certain yet.

I have kept npools =1 , nimage = 1, since the didn't seem applicable  
to my run. I have tried varying ntg to 1, 2 & 32 and kept ndiag to the  
default.

While varying the number of taskgroups (ntg), I would get output like  
the following:
Parallel version (MPI)

      Number of processors in use:    1024
      R & G space division:  proc/pool = 1024
      wavefunctions fft division:  fft/group =    2

      For Norm-Conserving or Ultrasoft (Vanderbilt) Pseudopotentials  
or PAW

      Current dimensions of program pwscf are:
      Max number of different atomic species (ntypx) = 10
      Max number of k-points (npk) =  40000
      Max angular momentum in pseudopotentials (lmaxx) =  3

      gamma-point specific algorithms are used

      Iterative solution of the eigenvalue problem

      a parallel distributed memory algorithm will be used,
      eigenstates matrixes will be distributed block like on
      ortho sub-group =   32*  32 procs

      Message from routine data_structure:
      some processors have no planes
      Message from routine data_structure:
      some processors have no smooth planes

Planes per process (thick) : nr3 =480 npp =   1 ncplane =*****

Proc/  planes cols     G    planes cols    G      columns  G
      Pool       (dense grid)       (smooth grid)      (wavefct grid)
        1      1    162    50122    1    162    50122     42     6294
        2      0    162    50122    0    162    50122     42     6294
        3      1    162    50122    1    162    50122     42     6294
        4      0    162    50122    0    162    50122     42     6294
        5      1    162    50122    1    162    50122     42     6294
(continues similarly for each of the 1024 procs)

So the number of FFT planes that need to be distributed is 480. The  
output below that made it seem like there were processors that which  
still weren't taking part in the calculation, and presumably weren't  
helping out in the distribution of the data.

My understanding is that the processors of each taskgroup would take  
part in the FFT calculation for the plane associated with the task  
group. So my first question is - is the fact that some procs don't  
have planes in my output actually an issue?

the output continues and the run finally dies here:

      Largest allocated arrays     est. size (Mb)     dimensions
         Kohn-Sham Wavefunctions        73.76 Mb     (   3147,1536)
         NL pseudopotentials           227.42 Mb     (   3147,4736)
         Each V/rho on FFT grid          3.52 Mb     ( 230400)
         Each G-vector array             0.19 Mb     (  25061)
         G-vector shells                 0.08 Mb     (  10422)
      Largest temporary arrays     est. size (Mb)     dimensions
         Auxiliary wavefunctions        73.76 Mb     (   3147,3072)
         Each subspace H/S matrix       72.00 Mb     (   3072,3072)
         Each <psi_i|beta_j> matrix     55.50 Mb     (   4736,1536)
         Arrays for rho mixing          28.12 Mb     ( 230400,   8)

      Initial potential from superposition of free atoms
      Check: negative starting charge=   -7.401460

      starting charge 2556.45492, renormalised to 2560.00000

      negative rho (up, down):  0.741E+01 0.000E+00
      Starting wfc are 2944 atomic wfcs

      total cpu time spent up to now is    704.01 secs

      per-process dynamical memory:    13.6 Mb

      Self-consistent Calculation

      iteration #  1     ecut=    38.22 Ry     beta=0.70
      Davidson diagonalization with overlap
      ethr =  1.00E-02,  avg # of iterations =  2.0
process group 2362 has completed

with an like this in the stderr file:

Abort(1) on node 210 (rank 210 in comm 1140850688): Fatal error in  
MPI_Scatterv: Other MPI error, error sta
ck:
MPI_Scatterv(360): MPI_Scatterv(sbuf=0x36c02010, scnts=0x7fffa940,  
displs=0x7fffb940, MPI_DOUBLE_PRECISION,
  rbuf=0x4b83010, rcount=230400, MPI_DOUBLE_PRECISION, root=0,  
comm=0x84000002) failed
MPI_Scatterv(100): Out of memory

So I figure I am running out of memory on a node at some point... but  
not entirely sure where (seems to be in the first electronic step) or  
how to get around it.

Any help would be appreciated.

Dave

David E. Farrell
Post-Doctoral Fellow
Department of Materials Science and Engineering
Northwestern University
email: d-farrell2 at northwestern.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20090128/11246cce/attachment.html>