[Pw_forum] Fwd: nstep, npool and FFT grid
Nicola Marzari
marzari at MIT.EDU
Sat Aug 25 23:35:35 CEST 2007
Dear Bhagawan,
you raise several relevant questions - hopefully someone can help as
well. Testing your specific system, on your clusterwith your
communication devices, is in any case the most important strategy.
1) the "***" come from a format declaration (i3, likely). We should
switch to i4, but you should be able to fish in pw where that line is
printed, and change the format output.
I'm not sure about about the fine details of FFT parallelization.
Stefano de Gironcoli would probably be the person, but I think he is
currently on travel.
2) this is easily answered by looking at INPUT_PW.
Electronic minimizations (at fixed ions) are controlled by
electron_maxstep
INTEGER ( default = 100 )
maximum number of iterations in a scf step
conv_thr REAL ( default = 1.D-6 )
Convergence threshold for selfconsistency:
estimated energy error < conv_thr
For ionic relaxations, at each ionic step the electrons are relaxed as
above, and the ionic relaxations are controlled by
etot_conv_thr REAL ( default = 1.0D-4 )
convergence threshold on total energy (a.u) for ionic
minimization: the convergence criterion is satisfied
when the total energy changes less than etot_conv_thr
between two consecutive scf steps.
See also forc_conv_thr - both criteria must be satisfied
forc_conv_thr REAL ( default = 1.0D-3 )
convergence threshold on forces (a.u) for ionic
minimization: the convergence criterion is satisfied
when all components of all forces are smaller than
forc_conv_thr.
See also etot_conv_thr - both criteria must be satisfied
The code will do a maximum of "nstep" ionic steps.
3) npools partitioning doesn't decrease the size of the memory each
processor needs, so it is usefull for small systems with many kpoints.
It is also very good for low communications. If you have 12 k-points,
run on 1, 2, 3, 4, 6, or 12 processors. G-parallelization is a tad
faster if your communications are excellent, but for larger and
larger systems it's difficult to keep perfect scaling. But a g-parallel
job is partitioned in smaller chuncks, so this parallelization is
necessary if you study systems that do not fit on 1 processor alone.
Not sure about the other questions on fft performance etc.
nicola
brsahu at physics.utexas.edu wrote:
> Dear PWSCF users,
>
> I submitted a query few days ago. If you have any suggestions pl. let me know
>
> Bhagawan
>
> ----- Forwarded message from brsahu at physics.utexas.edu -----
> Date: Wed, 22 Aug 2007 15:55:09 -0500
> From: brsahu at physics.utexas.edu
> Reply-To: brsahu at physics.utexas.edu
> Subject: nstep, npool and FFT grid
> To: pw_forum at pwscf.org
>
> Dear pwscf users,
>
> I have two questions regarding
>
> 1) FFT grid
>
> In a system where the FFT-grids along x-, y- and z-directions are not
> equal,which one of these decides the PW parallelization efficiency?
>
> I have a system with about 200 atoms. In the output, I get
>
> G cutoff = 164.2214 (2183159 G-vectors) FFT grid: ( 27,***,144)
> G cutoff = 43.7924 ( 301345 G-vectors) smooth grid: ( 15,625, 72)
>
> There are '***' printed for the y-dir FFT grid.
>
> This latest version of the pwscf code is compiled on a linux cluster
> (cvs version).
>
> Can something be changed (array declaration for FFT) so that it can
> print FFT grid for large systems so that I can tune the number of
> processors for
> G-vector and/or k-pt parallelization.
>
> 2) nstep and npool
>
> nstep (INPUT_PW file in Doc) says is the number of ionic+electronic
> steps which is
> 1 for scf, nscf, and bands and 0 for neb etc. and 50 for "relax"
> vc-relax etc by default.
>
> I have a run with "relax". I put nstep=250 does it mean it will do 250
> ionic steps
> ? Is there a default for how about the electronic steps? I could not
> find a separate tag
> for electronic steps.
>
> Also How do one estimate whether k-point parallelization will be
> faster or G-vector(FFT)
> grid parallelization is faster for a given system. The suggestion for
> parallelization
> issues in the usersguide suggest that choose number of processors such
> that the third FFT
> grid is divisible by number of processors chosen. and choose number of
> pools of
> processors so that number of k-points are divisible by that. Is there
> a default for
> number of pools of processors that a running job assumes if not
> specified in the "job"
> submitting script or one has to specify it explicitly?
>
> Is number of pools of processors divides the number of processors
> specified in the "job"
> submitting and # of proc./# of pools = the number of processors that
> should be divisor of
> the third FFT grid?
--
---------------------------------------------------------------------
Prof Nicola Marzari Department of Materials Science and Engineering
13-5066 MIT 77 Massachusetts Avenue Cambridge MA 02139-4307 USA
tel 617.4522758 fax 2586534 marzari at mit.edu http://quasiamore.mit.edu
More information about the users
mailing list