<pre><br></pre>

<p>Hi, I found in the new, or maybe not so new, users guide, that 1000 atoms or so can be calculated, and new ways to paralelize. </p><p>The example in the manual is <br></p><pre><font size="4"> mpirun -np 4096 ./pw.x -nimage 8 -npool 2 -ntg 8 -ndiag 144 -input <a href="http://myinput.in">myinput.in</a></font></pre>

<p>I have played a bit, but not with a massive computer, and I have found that the default options are always better than my unexpert  choices.</p><p>So, I would like to see some hints, in addition to what  is reproduced below (from the users guide), about the good choices of -ntg and -ndiag. Maybe  some examples is enough to understand it. <br>

</p><p><br></p><p>From the users guide:<br></p><p>This execute the PWscf code on 4096 processors, to simulate a system

with 8 images, each of which is distributed across 512 processors.

K-points are distributed across 2 pools of 256 processors each, 

3D FFT is performed using 8 task groups (64 processors each, so

the 3D real-space grid is cut into 64 slices), and the diagonalization

of the subspace Hamiltonian is distributed to a square grid of 144

processors (12x12).

</p><p>Default values are: -nimage 1 -npool 1 -ntg 1 ; ndiag is chosen

by the code as the fastest n^2 (n integer) that fits into the size

of each pool.

</p><p><b>Massively parallel calculations</b>:

For very large jobs (i.e. O(1000) atoms or so) or for very long jobs to be run on massively 

parallel  machines (e.g. IBM BlueGene) it is crucial to use in an effective way both the

"task group" and the "ortho group" parallelization. Without a judicious choice of parameters,

large jobs will find a stumbling block in either memory or 

CPU requirements. In particular, the "ortho group" parallelization is used in the diagonalization 

of matrices in the subspace of Kohn-Sham states (whose dimension is as a strict minumum equal to 

the number of occupied states). These are stored as block-distributed matrixes (distributed

across processors) and diagonalized using custom-taylored diagonalization algorithms 

that work on block-distributed matrixes.

</p>Thanks<br clear="all"><br>-- <br>Eduardo Menendez<br>Departamento de Fisica<br>Facultad de Ciencias<br>Universidad de Chile<br>Phone: (56)(2)9787439<br>URL: <a href="http://fisica.ciencias.uchile.cl/~emenendez">http://fisica.ciencias.uchile.cl/~emenendez</a><br>