David,<br><br>Seems like PWSCF calls Scalapack for its Cholesky decomposition. If you don't care too much about performance, doing the Cholesky decomposition on your system size would not slow you down terribly. However, you might<br>
pay a memory penalty. Depends on how the Cholesky decomposition is done. I imagine they probably collect<br>the entire overlap matrix in one place before sending it to Scalapack.<br><br><div class="gmail_quote">On Wed, Jan 28, 2009 at 3:18 PM, David Farrell <span dir="ltr"><<a href="mailto:davidfarrell2008@u.northwestern.edu">davidfarrell2008@u.northwestern.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div style=""><div>Oddly enough, the same input file, run in dual mode with 1 taskgroup (so each process should have access to 1 GB of RAM), doesn't spit out the previous error, but rather this one:</div>
<div><br></div><div><div>%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div> from pdpotf : error # 1</div><div> problems computing cholesky decomposition </div><div>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</div><div><br></div><div>I would have expected this one to fail the same way. The solution to this from the mailing list seems to be to disable the parallel cholesky decomposition, but that doesn't seem a very good option in my case.</div>
<div><br></div><div>I am trying to re-run this case to see if this error is reproducible, and trying the smp-mode version with 1 taskgroup to see if I can get a better read on where the MPI_Scatterv is being called from (there was no core file for the master process for some reason.)... and I am not really sure how to go about finding out the send buffer size (I guess a debugger may be the only option?)</div>
<div><br></div><div>Dave</div></div><div><div></div><div class="Wj3C7c"><div><br></div><div><br></div><div><br></div><br><div><div>On Jan 28, 2009, at 2:04 PM, Axel Kohlmeyer wrote:</div><br><blockquote type="cite"><div>On Wed, 28 Jan 2009, David Farrell wrote:<br>
<br><br>[...]<br><br>DF> Largest allocated arrays est. size (Mb) dimensions<br>DF> Kohn-Sham Wavefunctions 73.76 Mb ( 3147,1536)<br>DF> NL pseudopotentials 227.42 Mb ( 3147,4736)<br>
DF> Each V/rho on FFT grid 3.52 Mb ( 230400)<br>DF> Each G-vector array 0.19 Mb ( 25061)<br>DF> G-vector shells 0.08 Mb ( 10422)<br>DF> Largest temporary arrays est. size (Mb) dimensions<br>
DF> Auxiliary wavefunctions 73.76 Mb ( 3147,3072)<br>DF> Each subspace H/S matrix 72.00 Mb ( 3072,3072)<br>DF> Each <psi_i|beta_j> matrix 55.50 Mb ( 4736,1536)<br>
DF> Arrays for rho mixing 28.12 Mb ( 230400, 8)<br>DF> <br>[...]<br>DF> with an like this in the stderr file:<br>DF> <br>DF> Abort(1) on node 210 (rank 210 in comm 1140850688): Fatal error in<br>
DF> MPI_Scatterv: Other MPI error, error sta<br>DF> ck:<br>DF> MPI_Scatterv(360): MPI_Scatterv(sbuf=0x36c02010, scnts=0x7fffa940,<br>DF> displs=0x7fffb940, MPI_DOUBLE_PRECISION,<br>DF> rbuf=0x4b83010, rcount=230400, MPI_DOUBLE_PRECISION, root=0,<br>
DF> comm=0x84000002) failed<br>DF> MPI_Scatterv(100): Out of memory<br>DF> <br>DF> So I figure I am running out of memory on a node at some point... but not<br>DF> entirely sure where (seems to be in the first electronic step) or how to get<br>
DF> around it.<br><br>it dies on the processor calling MPI_Scatterv, probably the (group)master(s). <br>what is interesting is that the rcount size matches the "arrays for rho <br>mixing", so i would suggest to first have a look there and try to <br>
determine how large the combined send buffers are.<br><br>cheers,<br> axel.<br><br><br>DF> <br>DF> Any help would be appreciated.<br>DF> <br>DF> Dave<br>DF> <br>DF> <br>DF> <br>DF> <br>DF> David E. Farrell<br>
DF> Post-Doctoral Fellow<br>DF> Department of Materials Science and Engineering<br>DF> Northwestern University<br>DF> email: <a href="mailto:d-farrell2@northwestern.edu" target="_blank">d-farrell2@northwestern.edu</a><br>
DF> <br>DF> <br><br>-- <br>=======================================================================<br>Axel Kohlmeyer <a href="mailto:akohlmey@cmm.chem.upenn.edu" target="_blank">akohlmey@cmm.chem.upenn.edu</a> <a href="http://www.cmm.upenn.edu" target="_blank">http://www.cmm.upenn.edu</a><br>
Center for Molecular Modeling -- University of Pennsylvania<br>Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323<br>tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425<br>=======================================================================<br>
If you make something idiot-proof, the universe creates a better idiot.<br></div></blockquote></div><br></div></div><div> <span style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><div style="">
<div><div class="Ih2E3d"><div style="margin: 0px;"><font style="font-size: 12px;" face="Helvetica" size="3">David E. Farrell</font></div><div style="margin: 0px;"><font size="3"><span style="font-size: 12px;">Post-Doctoral Fellow</span></font></div>
</div><div class="Ih2E3d"><div style="margin: 0px;"><font size="3"><span style="font-size: 12px;">Department of Materials Science and Engineering</span></font></div></div><div style="margin: 0px;"><font style="font-size: 12px;" face="Helvetica" size="3">Northwestern University</font></div>
<div class="Ih2E3d"><div style="margin: 0px;"><font style="font-size: 12px;" face="Helvetica" size="3">email: <a href="mailto:d-farrell2@northwestern.edu" target="_blank">d-farrell2@northwestern.edu</a></font></div></div>
</div></div></span> </div><br></div><br>_______________________________________________<br>
Pw_forum mailing list<br>
<a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>
<a href="http://www.democritos.it/mailman/listinfo/pw_forum" target="_blank">http://www.democritos.it/mailman/listinfo/pw_forum</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>Nichols A. Romero, Ph.D.<br>Argonne Leadership Computing Facility<br>Argonne, IL 60490<br>(630) 252-3441 (O)<br>(630) 470-0462 (C)<br><br>