vijaya subramanian
vijaya65 at hotmail.com
Tue Jun 11 21:57:54 CEST 2013
Hi Paolo
I am running an scf calculation on gold slabs. I have somewhat limited resources on a supercomputer
and would like to optimize my runs. (Cray XT5 with 9,408 compute nodes interconnected with the SeaStar router
through HyperTransport. The SeaStars are all interconnected in a 3-D
torus topology. It is a massively parallel processing (MPP) machine.
Each compute node has two six-core 2.6 GHz AMD Opterons for a total of
112,896 cores. All nodes have 16 Gbytes of DDR2 memory: 1.33 Gbytes of
memory per core.)
A 54 gold atom slab scf calculation worked best with 120 processors/npool=2/ndiag=49/ntg6.
240 processors and I get very good speed. 64 processors and I get an out of memory issue.
When I use a larger unit cell I run into problems.
I have attached two files with different configurations of gold atoms in a slab calculation with larger unit cells.
The unit cells are different, one has six layers of gold atoms (unit cell - 16.12x48.36x60.8 in Bohr) and the other 2 layers of gold atoms (unit cell-54.x43.x54.).
For some reason I cannot get the 160 atom problem to work. (>2000 still doesn't work). For the 6 layer 162 atom problem(nproc=720 works). If I use fewer number of processors I get an out of memory
Do you have any suggestions for what the problem may be?
I have given partial output for the two calcs below:
160 atoms-1200 processors-the run failed before the diagonalization began.
Parallelization info
sticks: dense smooth PW G-vecs: dense smooth
Min 105 31 8 24383 3975
Max 106 32 9 24398 4042
Sum 75823 22755 5881 17559633 2885465 37
bravais-lattice index = 0
lattice parameter (alat) = 54.5658 a.u.
unit-cell volume = 129972.7994 (a.u.)^3
number of atoms/cell = 160
number of atomic types = 1
number of electrons = 1760.00
number of Kohn-Sham states= 2112
kinetic-energy cutoff = 30.0000 Ry
charge density cutoff = 400.0000 Ry
convergence threshold = 1.0E-06
mixing beta = 0.7000
number of iterations used = 8 plain mixing
Exchange-correlation = SLA PW PBX PBC ( 1 4 3 4 0)
EXX-fraction = 0.00
Dense grid: 17559633 G-vectors FFT dimensions: ( 360, 288, 360)
Smooth grid: 2885465 G-vectors FFT dimensions: ( 192, 160, 192)
Largest allocated arrays est. size (Mb) dimensions
Kohn-Sham Wavefunctions 32.87 Mb ( 1020, 2112)
NL pseudopotentials 42.33 Mb ( 510, 5440)
Each V/rho on FFT grid 1.58 Mb ( 103680)
Each G-vector array 0.19 Mb ( 24385)
G-vector shells 0.09 Mb ( 11350)
Largest temporary arrays est. size (Mb) dimensions
Auxiliary wavefunctions 131.48 Mb ( 1020, 8448)
Each subspace H/S matrix 3.36 Mb ( 469, 469)
Each <psi_i|beta_j> matrix 350.63 Mb ( 5440, 2, 2112)
Arrays for rho mixing 12.66 Mb ( 103680, 8)
Initial potential from superposition of free atoms
Check: negative starting charge= -0.028620
starting charge 1759.98221, renormalised to 1760.00000
negative rho (up, down): 0.286E-01 0.000E+00
Starting wfc are 2880 randomized atomic wfcs
Application 5992317 exit signals: Killed
162 atom run:
Parallelization info
sticks: dense smooth PW G-vecs: dense smooth PW
Min 34 10 2 8950 1450 178
Max 35 11 3 8981 1509 229
Sum 24841 7453 2003 6454371 1060521 148169
bravais-lattice index = 0
lattice parameter (alat) = 16.1227 a.u.
unit-cell volume = 47776.5825 (a.u.)^3
number of atoms/cell = 162
number of atomic types = 1
number of electrons = 1782.00
number of Kohn-Sham states= 2138
kinetic-energy cutoff = 30.0000 Ry
charge density cutoff = 400.0000 Ry
convergence threshold = 1.0E-06
mixing beta = 0.7000
number of iterations used = 8 plain mixing
Exchange-correlation = SLA PW PBX PBC ( 1 4 3 4 0)
EXX-fraction = 0.00
Non magnetic calculation with spin-orbit
