[Pw_forum] PW taskgroups and a large run on a BG/P

David Farrell davidfarrell2008 at u.northwestern.edu
Thu Feb 12 22:59:21 CET 2009


I set up an 864 atom run (so this has 1920 electrons, 960 states), and  
varied the number of nodes (the nodes were operating in smp mode so 1  
node == 1 core)

I ran the following cases successfully (with the current CVS version  
of pw, scalapack enabled):

512 cores, -ntg 8 -ndiag 256
256 cores, -ntg 8 -ndiag 256
128 cores, -ntg 4 -ndiag 121
64 cores, -ntg 4 -ndiag 64
32 cores, -ntg 4 -ndiag 25

it seemed to crash at 16 cores, -ntg 4 -ndiag 16 - the error was:

'pw.x: /bglhome/bgbuild/V1R2M0_200_2008-080513P/ppc/bgp/comm/sys/build- 
dcmf/include/devices/dma/DMAMulticast.h:568: static int  
DCMF 
::DMA 
::DMAMulticast<TDesc>::Registration::McastLongPacketHandler(void*,  
DCMF::DMA::DMAMulticast<TDesc>::PacketHeader*, void*, char*, int)  
[with TDesc = DMA_MemoryFifoDescriptor]: Assertion `recv != __null'  
failed.'

Comparing this to the 432 atom run at 128 cores, I am not convinced  
that the issue between different modes is due to not enough memory per  
core. This system is twice as large, so shouldn't have worked on 32  
cores if we assume linear scaling in memory usage.

To put it another way:

if we assume that the 432 atom case required more than 1 GB RAM/core  
to run at 128 cores (despite the output saying otherwise), then 864  
would require more than 2 GB RAM/core at 128 cores. If we assume that  
memory consumption scales linearly, then at 64 cores, I should have  
seen a problem (since it would presumably need something more than 2  
GB/core). Certainly by 32 cores (where it would need 4 GB/core). But I  
don't.

On the up side, this likely means that I *can* run a big system (I'll  
have to play around to find out how big - especially when I get into  
adding a vacuum region). The down side is that I have to do it in smp  
mode and have 3 idle cores/node.

Dave



On Feb 12, 2009, at 1:16 PM, Nichols A. Romero wrote:

> David,
>
> Just to clarify you mean.
>
> 128 nodes in vn mode = 512 cores
> 128 nodes in dual mode = 256 cores
> 128 nodes in smp mode = 128 cores
>
> Nichols A. Romero, Ph.D.
> Argonne Leadership Computing Facility
> Argonne National Laboratory
> Building 360 Room L-146
> 9700 South Cass Avenue
> Argonne, IL 60490
> (630) 252-3441
>
>
> ----- Original Message -----
> From: "David Farrell" <davidfarrell2008 at u.northwestern.edu>
> To: "PWSCF Forum" <pw_forum at pwscf.org>
> Cc: "Nichols A. Romero" <naromero at alcf.anl.gov>
> Sent: Thursday, February 12, 2009 12:24:38 PM GMT -06:00 US/Canada  
> Central
> Subject: Re: [Pw_forum] PW taskgroups and a large run on a BG/P
>
>
> I pulled down the current CVS version, compiled as I did with the  
> previous snapshot and got the same behavior:
>
>
> When I ran on 128 cores in vn mode with -ntg 4 -ndiag 121, I got a  
> cholesky error:
>
>
> When I ran on 128 cores in dual mode with -ntg 4 -ndiag 121, I got  
> the cholesky error:
>
>
>
> When I ran on 128 cores in smp mode with -ntg 4 -ndiag 121, it ran  
> fine.
>
>
> I guess I have 2 options:
>
>
> 1) try larger systems in SMP mode with the CVS version, see how big  
> I can get before things blow up. I'll just have to deal with the  
> extra cost of the idle CPUs.
>
>
> 2) climb into the code with a debugger to see if I can see anything  
> going on (things I am interested in now are how much memory is  
> actually available to the code, how much it is using, if there is  
> something funny going on in the different modes). I'll probably have  
> to construct a smaller system that does the same thing first.
>
>
> I don't want to abandon PW/CP just yet because this code has  
> demonstrated decent physics, and other codes would require me to do  
> develop PPs that give me results I can be confident in or way too  
> much work to get them scalable. Unfortunately - I also need to get  
> it running on the BG/P as I have a big allocation on that machine  
> that is otherwise wasted.
>
>
> Dave
>
>
>
>
>
>
>
> David E. Farrell
> Post-Doctoral Fellow
> Department of Materials Science and Engineering
> Northwestern University
> email: d-farrell2 at northwestern.edu

David E. Farrell
Post-Doctoral Fellow
Department of Materials Science and Engineering
Northwestern University
email: d-farrell2 at northwestern.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20090212/47527f9a/attachment.html>


More information about the users mailing list