[Pw_forum] PW taskgroups and a large run on a BG/P

Thu Feb 12 15:54:35 CET 2009

On Feb 12, 2009, at 15:06 , David Farrell wrote:

> I found when I took the 432 atom system I sent you, and ran it on  
> 128 cores in smp mode
> (1 MPI process/node - 2 GB per process) it did work (-ntg 32 -ndiag  
> 121

32 task groups? that's a lot

> as well as -ntg 4 -ndiag 121)

4 looks more reasonable in my opinion

> - the system didn't fit into memory  in vn mode (4 mpi processes/ 
> node -
> 512 MB per process)

that job requires approx. 100Mb of dynamically allocated RAM per  
process, plus
a few tens of Mb of work space. Why it does not fit into 512Mb is a  
mystery,
unless each process comes with a copy of all libraries. If this is  
the case, the
maximum you can fit into 512Mb is a code printing "Hello world" in  
parallel.

By the way: the default number of bands in metallic calculations can  
be trimmed
by a significant amount (e.g. 500 instead of 576)

> I then tried the system in dual mode (2 mpi processes/node - 1 GB  
> per process)
> using -ntg 4 and -ndiag 121. In this case, the cholesky error came up:

the code performs exactly the same operations, independently on how  
the MPI
processes are distributed. It looks like yet another BlueGene  
weirdness, like this:
   http://www.democritos.it:8888/O-sesame/chngview?cn=5777
   http://www.democritos.it:8888/O-sesame/chngview?cn=5932
that however affected only the internal parallel diagonalization, not  
the new
scalapack algorithm. I do not see any evidence that there is anything  
wrong
with the code itself.

Paolo
---
Paolo Giannozzi, Democritos and University of Udine, Italy