[Pw_forum] PW taskgroups and a large run on a BG/P

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Wed Jan 28 21:04:53 CET 2009


On Wed, 28 Jan 2009, David Farrell wrote:


[...]

DF>     Largest allocated arrays     est. size (Mb)     dimensions
DF>        Kohn-Sham Wavefunctions        73.76 Mb     (   3147,1536)
DF>        NL pseudopotentials           227.42 Mb     (   3147,4736)
DF>        Each V/rho on FFT grid          3.52 Mb     ( 230400)
DF>        Each G-vector array             0.19 Mb     (  25061)
DF>        G-vector shells                 0.08 Mb     (  10422)
DF>     Largest temporary arrays     est. size (Mb)     dimensions
DF>        Auxiliary wavefunctions        73.76 Mb     (   3147,3072)
DF>        Each subspace H/S matrix       72.00 Mb     (   3072,3072)
DF>        Each <psi_i|beta_j> matrix     55.50 Mb     (   4736,1536)
DF>        Arrays for rho mixing          28.12 Mb     ( 230400,   8)
DF> 
[...]
DF> with an like this in the stderr file:
DF> 
DF> Abort(1) on node 210 (rank 210 in comm 1140850688): Fatal error in
DF> MPI_Scatterv: Other MPI error, error sta
DF> ck:
DF> MPI_Scatterv(360): MPI_Scatterv(sbuf=0x36c02010, scnts=0x7fffa940,
DF> displs=0x7fffb940, MPI_DOUBLE_PRECISION,
DF> rbuf=0x4b83010, rcount=230400, MPI_DOUBLE_PRECISION, root=0,
DF> comm=0x84000002) failed
DF> MPI_Scatterv(100): Out of memory
DF> 
DF> So I figure I am running out of memory on a node at some point... but not
DF> entirely sure where (seems to be in the first electronic step) or how to get
DF> around it.

it dies on the processor calling MPI_Scatterv, probably the (group)master(s). 
what is interesting is that the rcount size matches the "arrays for rho 
mixing", so i would suggest to first have a look there and try to 
determine how large the combined send buffers are.

cheers,
   axel.


DF> 
DF> Any help would be appreciated.
DF> 
DF> Dave
DF> 
DF> 
DF> 
DF> 
DF> David E. Farrell
DF> Post-Doctoral Fellow
DF> Department of Materials Science and Engineering
DF> Northwestern University
DF> email: d-farrell2 at northwestern.edu
DF> 
DF> 

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.



More information about the users mailing list