[Pw_forum] PW taskgroups and a large run on a BG/P
Axel Kohlmeyer
akohlmey at cmm.chem.upenn.edu
Wed Jan 28 22:58:54 CET 2009
On Wed, 28 Jan 2009, David Farrell wrote:
DF> I am trying to re-run this case to see if this error is reproducible, and
DF> trying the smp-mode version with 1 taskgroup to see if I can get a better
DF> read on where the MPI_Scatterv is being called from (there was no core file
DF> for the master process for some reason.)... and I am not really sure how to
DF> go about finding out the send buffer size (I guess a debugger may be the
DF> only option?)
grep does the job. MPI_Scatterv is used only in fft_base.f90 and only
in three subroutines. pw.x uses only two of them and one is used in
exx.f90, which you most likely don't use, however the other use in
psymrho.f90/psymrho_mag.f90 is consistent with the problem happening
at the rho-mixing stage. you have a "grid size" allocate/deallocate
around the call to grid_scatter in psymrho.f90, so that may lead to
exhausting the memory when MPI_SCATTERV needs it internally...
HTH,
axel.
DF>
DF> Dave
DF>
DF>
DF>
DF>
DF> On Jan 28, 2009, at 2:04 PM, Axel Kohlmeyer wrote:
DF>
DF> >On Wed, 28 Jan 2009, David Farrell wrote:
DF> >
DF> >
DF> >[...]
DF> >
DF> > DF> Largest allocated arrays est. size (Mb) dimensions
DF> > DF> Kohn-Sham Wavefunctions 73.76 Mb ( 3147,1536)
DF> > DF> NL pseudopotentials 227.42 Mb ( 3147,4736)
DF> > DF> Each V/rho on FFT grid 3.52 Mb ( 230400)
DF> > DF> Each G-vector array 0.19 Mb ( 25061)
DF> > DF> G-vector shells 0.08 Mb ( 10422)
DF> > DF> Largest temporary arrays est. size (Mb) dimensions
DF> > DF> Auxiliary wavefunctions 73.76 Mb ( 3147,3072)
DF> > DF> Each subspace H/S matrix 72.00 Mb ( 3072,3072)
DF> > DF> Each <psi_i|beta_j> matrix 55.50 Mb ( 4736,1536)
DF> > DF> Arrays for rho mixing 28.12 Mb ( 230400, 8)
DF> > DF>
DF> >[...]
DF> > DF> with an like this in the stderr file:
DF> > DF>
DF> > DF> Abort(1) on node 210 (rank 210 in comm 1140850688): Fatal error in
DF> > DF> MPI_Scatterv: Other MPI error, error sta
DF> > DF> ck:
DF> > DF> MPI_Scatterv(360): MPI_Scatterv(sbuf=0x36c02010, scnts=0x7fffa940,
DF> > DF> displs=0x7fffb940, MPI_DOUBLE_PRECISION,
DF> > DF> rbuf=0x4b83010, rcount=230400, MPI_DOUBLE_PRECISION, root=0,
DF> > DF> comm=0x84000002) failed
DF> > DF> MPI_Scatterv(100): Out of memory
DF> > DF>
DF> > DF> So I figure I am running out of memory on a node at some point... but
DF> > DF> not
DF> > DF> entirely sure where (seems to be in the first electronic step) or how
DF> > DF> to get
DF> > DF> around it.
DF> >
DF> >it dies on the processor calling MPI_Scatterv, probably the
DF> >(group)master(s).
DF> >what is interesting is that the rcount size matches the "arrays for rho
DF> >mixing", so i would suggest to first have a look there and try to
DF> >determine how large the combined send buffers are.
DF> >
DF> >cheers,
DF> > axel.
DF> >
DF> >
DF> > DF>
DF> > DF> Any help would be appreciated.
DF> > DF>
DF> > DF> Dave
DF> > DF>
DF> > DF>
DF> > DF>
DF> > DF>
DF> > DF> David E. Farrell
DF> > DF> Post-Doctoral Fellow
DF> > DF> Department of Materials Science and Engineering
DF> > DF> Northwestern University
DF> > DF> email: d-farrell2 at northwestern.edu
DF> > DF>
DF> > DF>
DF> >
DF> >--
DF> >=======================================================================
DF> >Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
DF> > Center for Molecular Modeling -- University of Pennsylvania
DF> >Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
DF> >tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
DF> >=======================================================================
DF> >If you make something idiot-proof, the universe creates a better idiot.
DF>
DF> David E. Farrell
DF> Post-Doctoral Fellow
DF> Department of Materials Science and Engineering
DF> Northwestern University
DF> email: d-farrell2 at northwestern.edu
DF>
DF>
--
=======================================================================
Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
Center for Molecular Modeling -- University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the users
mailing list