<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>Hi<br>Thanks for your response. I am willing to accept help from anyone:)<br>Paolo had helped me earlier regarding parallelization-<br>I'll try what you said.<br>Vijaya<br>UNM<br><div>> From: ttduyle@gmail.com<br>> Date: Wed, 12 Jun 2013 10:14:11 -0400<br>> To: pw_forum@pwscf.org<br>> Subject: Re: [Pw_forum] Parallelization<br>> <br>> Just want to be clear, I am not Paolo !!!<br>> <br>> If you need more memory, you should not increase number of cores to a<br>> huge number. Instead, you can ask for more nodes but use less number<br>> of cores per node.<br>> <br>> For instant, you can ask for 16 nodes and use 6 cores per node. Check<br>> you environment, but it is highly that you need to use something like<br>> size = 192<br>> aprun -n 96 -N 6 pw.x ...<br>> <br>> If you regret to waste 1/2 node, check OPENMP for options.<br>> <br>> <br>> ----------------------------------------------------<br>> Duy Le<br>> Postdoctoral Associate<br>> Department of Physics<br>> University of Central Florida.<br>> Website: http://www.physics.ucf.edu/~dle<br>> <br>> <br>> On Tue, Jun 11, 2013 at 3:57 PM, vijaya subramanian<br>> <vijaya65@hotmail.com> wrote:<br>> > Hi Paolo<br>> > I am running an scf calculation on gold slabs. I have somewhat limited<br>> > resources on a supercomputer<br>> > and would like to optimize my runs. (Cray XT5 with 9,408 compute nodes<br>> > interconnected with the SeaStar router through HyperTransport. The SeaStars<br>> > are all interconnected in a 3-D torus topology. It is a massively parallel<br>> > processing (MPP) machine. Each compute node has two six-core 2.6 GHz AMD<br>> > Opterons for a total of 112,896 cores. All nodes have 16 Gbytes of DDR2<br>> > memory: 1.33 Gbytes of memory per core.)<br>> > A 54 gold atom slab scf calculation worked best with 120<br>> > processors/npool=2/ndiag=49/ntg6.<br>> > 240 processors and I get very good speed. 64 processors and I get an out of<br>> > memory issue.<br>> > When I use a larger unit cell I run into problems.<br>> > I have attached two files with different configurations of gold atoms in a<br>> > slab calculation with larger unit cells.<br>> > The unit cells are different, one has six layers of gold atoms (unit cell -<br>> > 16.12x48.36x60.8 in Bohr) and the other 2 layers of gold atoms (unit<br>> > cell-54.x43.x54.).<br>> > For some reason I cannot get the 160 atom problem to work. (>2000 still<br>> > doesn't work). For the 6 layer 162 atom problem(nproc=720 works). If I use<br>> > fewer number of processors I get an out of memory<br>> > problem.<br>> > Do you have any suggestions for what the problem may be?<br>> ><br>> > I have given partial output for the two calcs below:<br>> > 160 atoms-1200 processors-the run failed before the diagonalization began.<br>> > Parallelization info<br>> > --------------------<br>> > sticks: dense smooth PW G-vecs: dense smooth<br>> > Min 105 31 8 24383 3975<br>> > Max 106 32 9 24398 4042<br>> > Sum 75823 22755 5881 17559633 2885465 37<br>> ><br>> ><br>> > bravais-lattice index = 0<br>> > lattice parameter (alat) = 54.5658 a.u.<br>> > unit-cell volume = 129972.7994 (a.u.)^3<br>> > number of atoms/cell = 160<br>> > number of atomic types = 1<br>> > number of electrons = 1760.00<br>> > number of Kohn-Sham states= 2112<br>> > kinetic-energy cutoff = 30.0000 Ry<br>> > charge density cutoff = 400.0000 Ry<br>> > convergence threshold = 1.0E-06<br>> > mixing beta = 0.7000<br>> > number of iterations used = 8 plain mixing<br>> > Exchange-correlation = SLA PW PBX PBC ( 1 4 3 4 0)<br>> > EXX-fraction = 0.00<br>> > ........<br>> > Dense grid: 17559633 G-vectors FFT dimensions: ( 360, 288, 360)<br>> ><br>> > Smooth grid: 2885465 G-vectors FFT dimensions: ( 192, 160, 192)<br>> ><br>> > Largest allocated arrays est. size (Mb) dimensions<br>> > Kohn-Sham Wavefunctions 32.87 Mb ( 1020, 2112)<br>> > NL pseudopotentials 42.33 Mb ( 510, 5440)<br>> > Each V/rho on FFT grid 1.58 Mb ( 103680)<br>> > Each G-vector array 0.19 Mb ( 24385)<br>> > G-vector shells 0.09 Mb ( 11350)<br>> > Largest temporary arrays est. size (Mb) dimensions<br>> > Auxiliary wavefunctions 131.48 Mb ( 1020, 8448)<br>> > Each subspace H/S matrix 3.36 Mb ( 469, 469)<br>> > Each <psi_i|beta_j> matrix 350.63 Mb ( 5440, 2, 2112)<br>> > Arrays for rho mixing 12.66 Mb ( 103680, 8)<br>> ><br>> > Initial potential from superposition of free atoms<br>> > Check: negative starting charge= -0.028620<br>> ><br>> > starting charge 1759.98221, renormalised to 1760.00000<br>> ><br>> > negative rho (up, down): 0.286E-01 0.000E+00<br>> > Starting wfc are 2880 randomized atomic wfcs<br>> > Application 5992317 exit signals: Killed<br>> ><br>> > 162 atom run:<br>> > Parallelization info<br>> > --------------------<br>> > sticks: dense smooth PW G-vecs: dense smooth PW<br>> > Min 34 10 2 8950 1450 178<br>> > Max 35 11 3 8981 1509 229<br>> > Sum 24841 7453 2003 6454371 1060521 148169<br>> ><br>> ><br>> > bravais-lattice index = 0<br>> > lattice parameter (alat) = 16.1227 a.u.<br>> > unit-cell volume = 47776.5825 (a.u.)^3<br>> > number of atoms/cell = 162<br>> > number of atomic types = 1<br>> > number of electrons = 1782.00<br>> > number of Kohn-Sham states= 2138<br>> > kinetic-energy cutoff = 30.0000 Ry<br>> > charge density cutoff = 400.0000 Ry<br>> > convergence threshold = 1.0E-06<br>> > mixing beta = 0.7000<br>> > number of iterations used = 8 plain mixing<br>> > Exchange-correlation = SLA PW PBX PBC ( 1 4 3 4 0)<br>> > EXX-fraction = 0.00<br>> > Non magnetic calculation with spin-orbit<br>> ><br>> ><br>> > _______________________________________________<br>> > Pw_forum mailing list<br>> > Pw_forum@pwscf.org<br>> > http://pwscf.org/mailman/listinfo/pw_forum<br>> _______________________________________________<br>> Pw_forum mailing list<br>> Pw_forum@pwscf.org<br>> http://pwscf.org/mailman/listinfo/pw_forum<br></div> </div></body>
</html>