[Pw_forum] Parallelization

Duy Le ttduyle at gmail.com
Wed Jun 12 16:14:11 CEST 2013


Just want to be clear, I am not Paolo !!!

If you need more memory, you should not increase number of cores to a
huge number. Instead, you can ask for more nodes but use less number
of cores per node.

For instant, you can ask for 16 nodes and use 6 cores per node. Check
you environment, but it is highly that you need to use something like
size = 192
aprun -n 96 -N 6 pw.x ...

If you regret to waste 1/2 node, check OPENMP for options.


----------------------------------------------------
Duy Le
Postdoctoral Associate
Department of Physics
University of Central Florida.
Website: http://www.physics.ucf.edu/~dle


On Tue, Jun 11, 2013 at 3:57 PM, vijaya subramanian
<vijaya65 at hotmail.com> wrote:
> Hi Paolo
> I am running an scf calculation on gold slabs. I have somewhat limited
> resources on a supercomputer
> and would like to optimize my runs.  (Cray XT5 with 9,408 compute nodes
> interconnected with the SeaStar router through HyperTransport. The SeaStars
> are all interconnected in a 3-D torus topology. It is a massively parallel
> processing (MPP) machine. Each compute node has two six-core 2.6 GHz AMD
> Opterons for a total of 112,896 cores. All nodes have 16 Gbytes of DDR2
> memory: 1.33 Gbytes of memory per core.)
> A 54 gold atom slab scf calculation worked best with 120
> processors/npool=2/ndiag=49/ntg6.
> 240 processors and I get very good speed.  64 processors and I get an out of
> memory issue.
> When I use a larger unit cell I run into problems.
> I have attached two files with different configurations of gold atoms in a
> slab calculation with larger unit cells.
> The unit cells are different, one has six layers of gold atoms (unit cell -
> 16.12x48.36x60.8 in Bohr) and the other 2 layers of gold atoms (unit
> cell-54.x43.x54.).
> For some reason I  cannot get the 160 atom problem to work. (>2000 still
> doesn't work). For the 6 layer 162 atom problem(nproc=720 works).   If I use
> fewer number of processors I get an out of memory
> problem.
> Do you have any suggestions for what the problem may be?
>
> I have given partial output for the two calcs below:
> 160 atoms-1200 processors-the run failed before the diagonalization began.
>      Parallelization info
>      --------------------
>      sticks:   dense  smooth     PW     G-vecs:    dense   smooth
>      Min         105      31      8                24383     3975
>      Max         106      32      9                24398     4042
>      Sum       75823   22755   5881             17559633  2885465  37
>
>
>      bravais-lattice index     =            0
>      lattice parameter (alat)  =      54.5658  a.u.
>      unit-cell volume          =  129972.7994 (a.u.)^3
>      number of atoms/cell      =          160
>      number of atomic types    =            1
>      number of electrons       =      1760.00
>      number of Kohn-Sham states=         2112
>      kinetic-energy cutoff     =      30.0000  Ry
>      charge density cutoff     =     400.0000  Ry
>      convergence threshold     =      1.0E-06
>      mixing beta               =       0.7000
>      number of iterations used =            8  plain     mixing
>      Exchange-correlation      =  SLA  PW   PBX  PBC ( 1 4 3 4 0)
>      EXX-fraction              =        0.00
> ........
>      Dense  grid: 17559633 G-vectors     FFT dimensions: ( 360, 288, 360)
>
>      Smooth grid:  2885465 G-vectors     FFT dimensions: ( 192, 160, 192)
>
>      Largest allocated arrays     est. size (Mb)     dimensions
>         Kohn-Sham Wavefunctions        32.87 Mb     (   1020, 2112)
>         NL pseudopotentials            42.33 Mb     (    510, 5440)
>         Each V/rho on FFT grid          1.58 Mb     ( 103680)
>         Each G-vector array             0.19 Mb     (  24385)
>         G-vector shells                 0.09 Mb     (  11350)
>      Largest temporary arrays     est. size (Mb)     dimensions
>         Auxiliary wavefunctions       131.48 Mb     (   1020, 8448)
>         Each subspace H/S matrix        3.36 Mb     ( 469, 469)
>         Each <psi_i|beta_j> matrix    350.63 Mb     (   5440,   2, 2112)
>         Arrays for rho mixing          12.66 Mb     ( 103680,   8)
>
>      Initial potential from superposition of free atoms
>      Check: negative starting charge=   -0.028620
>
>      starting charge 1759.98221, renormalised to 1760.00000
>
>      negative rho (up, down):  0.286E-01 0.000E+00
>      Starting wfc are 2880 randomized atomic wfcs
> Application 5992317 exit signals: Killed
>
> 162 atom run:
>      Parallelization info
>      --------------------
>      sticks:   dense  smooth     PW     G-vecs:    dense   smooth      PW
>      Min          34      10      2                 8950     1450     178
>      Max          35      11      3                 8981     1509     229
>      Sum       24841    7453   2003              6454371  1060521  148169
>
>
>      bravais-lattice index     =            0
>      lattice parameter (alat)  =      16.1227  a.u.
>      unit-cell volume          =   47776.5825 (a.u.)^3
>      number of atoms/cell      =          162
>      number of atomic types    =            1
>      number of electrons       =      1782.00
>      number of Kohn-Sham states=         2138
>      kinetic-energy cutoff     =      30.0000  Ry
>      charge density cutoff     =     400.0000  Ry
>      convergence threshold     =      1.0E-06
>      mixing beta               =       0.7000
>      number of iterations used =            8  plain     mixing
>      Exchange-correlation      =  SLA  PW   PBX  PBC ( 1 4 3 4 0)
>      EXX-fraction              =        0.00
>      Non magnetic calculation with spin-orbit
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum



More information about the users mailing list