[Pw_forum] PW taskgroups and a large run on a BG/P
David Farrell
davidfarrell2008 at u.northwestern.edu
Wed Feb 11 20:16:30 CET 2009
I was able to make a bit more progress - at least in the direction of
seeing what is breaking as I get to larger numbers of electrons. It
seems to be mainly due to a number of large allocates which all of
the processes carry out. I don't yet know enough about the code to
know, but I suspect these aren't all necessary.
I have found 2 that have caused problems in different runs. The first
was the problem that Axel pointed out earlier in this thread, the next
is this one:
In my periodic bulk case (1156 atoms, 2560 electrons), running on 1024
procs in vn mode (256 nodes, 512 MB RAM/process):
"add_vuspsi.f90", line 78: 1525-108 Error encountered while attempting
to allocate a data object. The program will stop.
- at this line, there is an allocate that is the size of: (number of
projectors for the atom types) x (number of states).... (so something
like 4736 * 1536 ... ~55 MB) which seems to be kicking it over the
limit. This prevented the system from getting to where it would ouput
info about the SCF steps.
My memory report output for this run looks like:
Largest allocated arrays est. size (Mb) dimensions
Kohn-Sham Wavefunctions 3.35 Mb ( 143,1536)
NL pseudopotentials 10.33 Mb ( 143,4736)
Each V/rho on FFT grid 0.78 Mb ( 51200)
Each G-vector array 0.01 Mb ( 1006)
G-vector shells 0.00 Mb ( 488)
Largest temporary arrays est. size (Mb) dimensions
Auxiliary wavefunctions 6.70 Mb ( 143,6144)
Each subspace H/S matrix 288.00 Mb ( 6144,6144)
Each <psi_i|beta_j> matrix 55.50 Mb ( 4736,1536)
Arrays for rho mixing 6.25 Mb ( 51200, 8)
So I am guessing that something just isn't getting split up right or
at least not very efficiently - probably the Hamiltonian and Overlap
matrices as a start.
The above case in dual mode (512 nodes, 1 GB RAM/process) was able to
get into the SCF stages, and do some output:
Self-consistent Calculation
iteration # 1 ecut= 38.22 Ry beta=0.70
Davidson diagonalization with overlap
but then died with the following memory-related error:
"regterg.f90", line 108: 1525-108 Error encountered while attempting
to allocate a data object. The program will stop.
Which appears to be caused by an allocation that looks like the #
planewaves * # states (or some subset of them).
I guess this isn't really a surprise to you guys, and I am not really
sure what to do about it, but at least I am now getting some idea of
what is causing the breakdown.
Dave
On Feb 11, 2009, at 7:18 AM, Paolo Giannozzi wrote:
> Hi, any news on your BG problem? Paolo
>
> --
> Paolo Giannozzi, Democritos and University of Udine, Italy
David E. Farrell
Post-Doctoral Fellow
Department of Materials Science and Engineering
Northwestern University
email: d-farrell2 at northwestern.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20090211/cabaa17c/attachment.html>
More information about the users
mailing list