[Pw_forum] default parallelization and parallelization of bands.x

Axel Kohlmeyer akohlmey at gmail.com
Mon Dec 7 15:04:22 CET 2015


On Mon, Dec 7, 2015 at 8:51 AM, Maxim Skripnik
<maxim.skripnik at uni-konstanz.de> wrote:
> So you mean it's not normal that bands.x takes more than 7 hours? What's
> suspicious is that the reported actual CPU time is much less, only 16
> minutes. What could be the problem?

most likely, you don't have enough RAM or there are other running
processes consuming too much RAM. when the OS has to replace RAM with
swap space, things come to a crawl. disk i/o is several orders of
magnitude slower than memory i/o.

axel.


> Here's the output of a bands.x calculation:
>
>      Program BANDS v.5.1.2 starts on  5Dec2015 at  9:15:18
>
>      This program is part of the open-source Quantum ESPRESSO suite
>      for quantum simulation of materials; please cite
>          "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
>           URL http://www.quantum-espresso.org",
>      in publications or presentations arising from this work. More details
> at
>      http://www.quantum-espresso.org/quote
>
>      Parallel version (MPI), running on    64 processors
>      R & G space division:  proc/nbgrp/npool/nimage =      64
>
>      Reading data from directory:
>      ./tmp/Ni3HTP2.save
>
>    Info: using nr1, nr2, nr3 values from input
>
>    Info: using nr1s, nr2s, nr3s values from input
>
>      IMPORTANT: XC functional enforced from input :
>      Exchange-correlation      =  SLA  PW   PBE  PBE ( 1  4  3  4 0 0)
>      Any further DFT definition will be discarded
>      Please, verify this is what you really want
>
>                file H.pbe-rrkjus.UPF: wavefunction(s)  1S renormalized
>                file C.pbe-rrkjus.UPF: wavefunction(s)  2S 2P renormalized
>                file N.pbe-rrkjus.UPF: wavefunction(s)  2S renormalized
>                file Ni.pbe-nd-rrkjus.UPF: wavefunction(s)  4S renormalized
>
>      Parallelization info
>      --------------------
>      sticks:   dense  smooth     PW     G-vecs:    dense   smooth      PW
>      Min         588     588    151                92668    92668   12083
>      Max         590     590    152                92671    92671   12086
>      Sum       37643   37643   9677              5930831  5930831  773403
>
>
>      Check: negative/imaginary core charge=   -0.000004    0.000000
>
>      negative rho (up, down):  2.225E-03 0.000E+00
>      high-symmetry point:  0.0000 0.0000 0.4981   x coordinate   0.0000
>      high-symmetry point:  0.3332 0.5780 0.4981   x coordinate   0.6672
>      high-symmetry point:  0.5000 0.2890 0.4981   x coordinate   1.0009
>      high-symmetry point:  0.0000 0.0000 0.4981   x coordinate   1.5784
>
>      Plottable bands written to file bands.out.gnu
>      Bands written to file bands.out
>
>      BANDS        :     0h16m CPU        7h38m WALL
>
>
>    This run was terminated on:  16:53:49   5Dec2015
>
> =------------------------------------------------------------------------------=
>    JOB DONE.
> =------------------------------------------------------------------------------=
>
>
>
>
>
> Am Samstag, 05. Dezember 2015 21:03 CET, stefano de gironcoli
> <degironc at sissa.it> schrieb:
>
>
>
>
> The only parallelization that i see in bands is the basic one over R & G. If
> it is different from the parallelization used previously you should use
> wf_collect.
> the code computes the overlap between the orbital at k and k+dk in order to
> decide how to connect them. it's an nbnd^2 operation done band by band. not
> very efficient evidently but it should not take hours.
> you can use wf_collect=.true. and increase the number of processors.
>
> stefano
>
>
> On 05/12/2015 12:57, Maxim Skripnik wrote:
>
> Thank you for the information. Yes, at the beginning of the pw.x output it
> says:
>      Parallel version (MPI), running on    64 processors
>      R & G space division:  proc/nbgrp/npool/nimage =      64
>
> Is bands.x parallelized at all? If so, where can I find information on that?
> There's nothing mentioned in the documentation:
> http://www.quantum-espresso.org/wp-content/uploads/Doc/pp_user_guide.pdf
> http://www.quantum-espresso.org/wp-content/uploads/Doc/INPUT_BANDS.html
>
> What could be the reason for bands.x taking many hours to calculate the
> bands? The foregoing pw.x calculation has already determined the energy for
> each k-point along a path (Gamma -> K -> M -> Gamma). There are 61 k-points
> and 129 bands. So what is bands.x actaully doing beside reformating that
> data? The input file job.bands looks like this:
>  &bands
>     prefix   = 'st1'
>     outdir   = './tmp'
> /
> The calculation is initiated by
> mpirun -np 64 bands.x < job.bands
>
> Maxim Skripnik
> Department of Physics
> University of Konstanz
>
> Am Samstag, 05. Dezember 2015 02:37 CET, stefano de gironcoli
> <degironc at sissa.it> schrieb:
>
>
>
>
> On 04/12/2015 22:53, Maxim Skripnik wrote:
>
> Hello,
>
> I'm a bit confused by the parallelization scheme of QE. First of all, I run
> calculations on a cluster with usually 1 to 8 nodes, each of which has 16
> cores. There is a very good scaling of pw.x e.g. for structural relaxation
> jobs. I do not specify any particular parallelization scheme as mentioned in
> the documentation, i.e. I start the calculations with
> mpirun -np 128 pw.x < job.pw
> on 8 nodes, 16 cores each. According to the documentation ni=1, nk=1 and
> nt=1. So in which respect are the calculations parallelized by default? Why
> do the calculations scale so well without specifying ni, nk, nt, nd?
>
> R and G parallelization is performed.
> wavefunctions' planewaves, density planewaves and slices of real space
> objects are distributed across 128 processors. A report of how this is done
> is given at the beginning of the output.
> Did you had a look to it ?
>
>
> Second question is, whether one can speed up bands.x calculations. Up to now
> I start these this way:
> mpirun -np 64 bands.x < job.bands
> on 4 nodes, 16 cores each. Does it make sense to define nb for bands.x? If
> yes, what would be reasonable values?
>
> expect no gain. band parallelization is not implemented in bands.
>
> stefano
>
>
>
>
>
>
>
>
> The systems of interest consist of typically ~50 atoms with periodic
> boundaries.
>
> Maxim Skripnik
> Department of Physics
> University of Konstanz
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum



-- 
Dr. Axel Kohlmeyer  akohlmey at gmail.com  http://goo.gl/1wk0
College of Science & Technology, Temple University, Philadelphia PA, USA
International Centre for Theoretical Physics, Trieste. Italy.



More information about the users mailing list