<html>
<head>
</head>
<body class='hmmessage'><div dir='ltr'>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style>
<div dir="ltr">Hi Paolo<br>I am running an scf calculation on gold slabs. I have somewhat limited resources on a supercomputer<br>and would like to optimize my runs. (Cray XT5 with 9,408 compute nodes interconnected with the SeaStar router
through HyperTransport. The SeaStars are all interconnected in a 3-D
torus topology. It is a massively parallel processing (MPP) machine.
Each compute node has two six-core 2.6 GHz AMD Opterons for a total of
112,896 cores. All nodes have 16 Gbytes of DDR2 memory: 1.33 Gbytes of
memory per core.) <br><b>A 54 gold atom slab scf calculation worked best with 120 processors/npool=2/ndiag=49/ntg6</b>.<br>240 processors and I get very good speed. 64 processors and I get an out of memory issue.<br>When I use a larger unit cell I run into problems.<br>I have attached two files with different configurations of gold atoms in a slab calculation with larger unit cells.<br>The unit cells are different, one has six layers of gold atoms (unit cell - 16.12x48.36x60.8 in Bohr) and the other 2 layers of gold atoms (unit cell-54.x43.x54.). <br>For some reason I cannot get the 160 atom problem to work. (>2000 still doesn't work). For the 6 layer 162 atom problem(nproc=720 works). If I use fewer number of processors I get an out of memory<br>problem.<br>Do you have any suggestions for what the problem may be?<br><br>I have given partial output for the two calcs below:<br><b>160 atoms</b>-1200 processors-the run failed before the diagonalization began.<br> Parallelization info<br> --------------------<br> sticks: dense smooth PW G-vecs: dense smooth<br> Min 105 31 8 24383 3975<br> Max 106 32 9 24398 4042<br> Sum 75823 22755 5881 17559633 2885465 37<br><br><br> bravais-lattice index = 0<br> lattice parameter (alat) = 54.5658 a.u.<br> unit-cell volume = 129972.7994 (a.u.)^3<br> number of atoms/cell = 160<br> number of atomic types = 1<br> number of electrons = 1760.00<br> number of Kohn-Sham states= 2112<br> kinetic-energy cutoff = 30.0000 Ry<br> charge density cutoff = 400.0000 Ry<br> convergence threshold = 1.0E-06<br> mixing beta = 0.7000<br> number of iterations used = 8 plain mixing<br> Exchange-correlation = SLA PW PBX PBC ( 1 4 3 4 0)<br> EXX-fraction = 0.00<br>........<br> Dense grid: 17559633 G-vectors FFT dimensions: ( 360, 288, 360)<br><br> Smooth grid: 2885465 G-vectors FFT dimensions: ( 192, 160, 192)<br><br> Largest allocated arrays est. size (Mb) dimensions<br> Kohn-Sham Wavefunctions 32.87 Mb ( 1020, 2112)<br> NL pseudopotentials 42.33 Mb ( 510, 5440)<br> Each V/rho on FFT grid 1.58 Mb ( 103680)<br> Each G-vector array 0.19 Mb ( 24385)<br> G-vector shells 0.09 Mb ( 11350)<br> Largest temporary arrays est. size (Mb) dimensions<br> Auxiliary wavefunctions 131.48 Mb ( 1020, 8448)<br> Each subspace H/S matrix 3.36 Mb ( 469, 469)<br> Each <psi_i|beta_j> matrix 350.63 Mb ( 5440, 2, 2112)<br> Arrays for rho mixing 12.66 Mb ( 103680, 8)<br><br> Initial potential from superposition of free atoms<br> Check: negative starting charge= -0.028620<br><br> starting charge 1759.98221, renormalised to 1760.00000<br><br> negative rho (up, down): 0.286E-01 0.000E+00<br> Starting wfc are 2880 randomized atomic wfcs<br>Application 5992317 exit signals: Killed<br><br><u><b>162 atom run</b></u>:<br> Parallelization info<br> --------------------<br> sticks: dense smooth PW G-vecs: dense smooth PW<br> Min 34 10 2 8950 1450 178<br> Max 35 11 3 8981 1509 229<br> Sum 24841 7453 2003 6454371 1060521 148169<br><br><br> bravais-lattice index = 0<br> lattice parameter (alat) = 16.1227 a.u.<br> unit-cell volume = 47776.5825 (a.u.)^3<br> number of atoms/cell = 162<br> number of atomic types = 1<br> number of electrons = 1782.00<br> number of Kohn-Sham states= 2138<br> kinetic-energy cutoff = 30.0000 Ry<br> charge density cutoff = 400.0000 Ry<br> convergence threshold = 1.0E-06<br> mixing beta = 0.7000<br> number of iterations used = 8 plain mixing<br> Exchange-correlation = SLA PW PBX PBC ( 1 4 3 4 0)<br> EXX-fraction = 0.00<br> Non magnetic calculation with spin-orbit<br><br></div>
</div>
</div>
</div>
</div></body>
</html>