[Pw_forum] PW taskgroups and a large run on a BG/P

David Farrell davidfarrell2008 at u.northwestern.edu
Thu Feb 12 15:06:35 CET 2009


Just to add to my last email, I tried to reproduce your results (and  
it looks like you emailed the list back before I had a chance to  
finish this)

I found when I took the 432 atom system I sent you, and ran it on 128  
cores in smp mode (1 MPI process/node - 2 GB per process) it did work  
(-ntg 32 -ndiag 121 as well as -ntg 4 -ndiag 121) - the system didn't  
fit into memory  in vn mode (4 mpi processes/node - 512 MB per process).

the error in the vn mode case was:
'"fft_parallel.f90", line 104: 1525-108 Error encountered while  
attempting to allocate a data object.  The program will stop.'

I then tried the system in dual mode (2 mpi processes/node - 1 GB per  
process) using -ntg 4 and -ndiag 121. In this case, the cholesky error  
came up:

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%
      task #        12
      from  pdpotf  : error #         1
       problems computing cholesky decomposition
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%

I then tried the dual mode case with -ntg 4 and -ndiag 100

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%
      task #        11
      from  pdpotf  : error #         1
       problems computing cholesky decomposition
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%


So now I am left wondering if I am not trying to spread the work out  
too much (sounds like this was the case) and that is what is leading  
to these cholesky errors. But the dual mode error seems to point to  
something else going on.


Dave

On Feb 12, 2009, at 6:53 AM, Paolo Giannozzi wrote:

> On Feb 11, 2009, at 19:41 , David Farrell wrote:
>
>> Let me know if the attachment doesn't make it through
>
> it didn't to the mailing list (max attachment size 40kB), but
> I received it. Attached is what I got (for a scf calculation) on
> a cray xt5. Apart from the bogus values of planes printed in
> the output, everything else seems ok, including parallel
> subspace diagonalization (scalapack) on 121 processors.
>
> Paolo
>
> <test432.out.gz>
> ---
> Paolo Giannozzi, Democritos and University of Udine, Italy
>
>

David E. Farrell
Post-Doctoral Fellow
Department of Materials Science and Engineering
Northwestern University
email: d-farrell2 at northwestern.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20090212/f70be33a/attachment.html>


More information about the users mailing list