[Q-e-developers] Behavior of band-parallelization in QE 6.1

Wed Jul 12 19:15:10 CEST 2017

Hi Ryan,

   As Paolo said, band parallelization in QE 6.1 has been modified
significantly.  One of the main changes is that the local part of the
calculation (basically, everything outside of PW/src/exx.f90) is performed
as though there is only a single band group.  This change enables the code
to avoid duplicating all of the work associated with the local part of the
calculation across each band group, which improves the efficiency of the
parallelization, especially when running at scale.

   As you observed, the new behavior in QE 6.1 is that the number of plane
waves is distributed across all processors, independent of the number of
band groups.  When running with a very large number of processors, QE 6.1
may complain that there are not enough plane-waves and exit, just as would
be the case if you were running using a local or semi-local functional.
There is no fundamental reason why the code can't be modified to run with
some processors having no plane-waves - in fact, I have done some work to
create a patch that does exactly this.  It isn't finished yet, but when it
is I could share it with you if you would like.

   That having been said, I am curious about whether you are actually
benefiting from running on 8000 processors.  From your output files for the
zinc dimer, it looks like your best walltime with 20 procs (39 s) is about
the same as your best walltime with 320 procs, with a significant fraction
of the parallelization inefficiencies being associated with the
diagonalization.  I would be interested in knowing the following:

   1. How big are your production runs?  Can you show me an example input
file?

   2. Using QE 6.0, how does the walltime of an 8000+ processor calculation
compare with the walltime of a 4000 or 2000 processor calculation?

   3. For a given number of processors and the optimal choices of -nb, how
do your QE 6.0 walltimes compare with your QE 6.1 walltimes?  Keep in mind
that in QE 6.1 you can use task groups alongside band groups.  As a rough
rule-of-thumb, I would suggest setting -ntg to -nb/8 for any QE 6.1
calculations.

   Also, one more observation: You don't appear to be running with ACE.  If
you recompile with ACE you are likely to see a significant speedup (often
5-10x) for your hybrid calculations.

Best,
Taylor

On Wed, Jul 12, 2017 at 6:09 AM, Paolo Giannozzi <paolo.giannozzi at uniud.it>
wrote:

> Dear Ryan
>
> the band parallelization for exact-exchange has undergone some
> reshuffling. See this paper: Taylor A. Barnes et al., "Improved treatment
> of exact exchange in Quantum ESPRESSO", Computer Physics Communications,
> May 31, 2017. From what I understand, the starting calculation (no exact
> exchange) is performed as usual, with no band parallelization, while
> parallellization on pairs of bands is applied to the exact-exchange
> calculation. I am forwarding to Taylor and Thorsten who may know better
> than me (Thorsten: I haven't forgotten your patch of two months ago! it is
> in the pipeline of things to be done)
>
> Paolo
>
> On Mon, Jul 10, 2017 at 9:55 PM, Ryan McAvoy <mcavor11 at gmail.com> wrote:
>
>> Hello,
>>
>> I am Ryan L. McAvoy, a PhD student in Giulia Galli's group. I am trying
>> to use the band parallelization for hybrids in QE 6.1 and I am finding
>> unexpected behavior. I have created a test case on a small test system (the
>> zinc dimer) to illustrate the following behavior. I have attached those
>> files.
>>
>>
>>    1. The output informing the user that there is band-parallelization
>>    for a hybrid functional appears to have been broken as changing the number
>>    of band groups with -nbgrp does not trigger the output that should occur
>>    from subroutine parallel_info() in environment.f90, which would indicate it
>>    believes nbgrp to be 1. This may be triggered by the statement
>>    "mp_start_bands(1 ,...." at line 94 of mp_global.f90 as that I have checked
>>    that "nband_" is the correct value after "CALL get_command_line()" in
>>    mp_global.f90
>>    2. The number of planewaves appears to be distributed over all of the
>>    processors even at large numbers of "nband_". I have checked that this is
>>    more than an output error by printing the lda(npw) at each run of h_psi and
>>    it exactly conforms to what one would expect by dividing the total number
>>    of planewaves  by the number of processors(plus a factor of 1/2 for gamma
>>    tricks).
>>    3. Behavior 2 prevents me from scaling to as large a number of
>>    processors as I could with QE 6.0. As using QE 6.0 hybrids on C60, I could
>>    run on 8000+ processors on the BGQ machine Cetus at Argonne National Lab
>>    but with QE 6.1 the output says that it has run out of planewaves even with
>>    a large number of band groups(I have demonstrated this behavior below on
>>    the zinc dimer on 640 Intel processors to aid reproducibility)
>>
>> Is #2 the intended behavior for this new parallelization method?
>>
>> Thank you for your time and attention to this matter,
>> Ryan L. McAvoy
>>
>> ............................................................
>> ............................................................
>> ...........................................................
>>
>>
>> My run scripts are of the form
>>
>> module load mkl/11.2
>> module load intelmpi/5.0+intel-15.0
>>
>> QE_BIN_DIR=PUTPATHHERE/qe-6.1/bin
>>
>> export MPI_TASKS=$SLURM_NTASKS
>>
>> exe=${QE_BIN_DIR}/pw.x
>>
>> export OMP_NUM_THREADS=1
>>
>> nband=10
>> mpirun -n $MPI_TASKS ${exe} -nb $nband <  ${fileVal}.in >
>>  ${fileVal}_nband${nband}_nproc${MPI_TASKS}.out
>>
>>
>>
>> _______________________________________________
>> Q-e-developers mailing list
>> Q-e-developers at qe-forge.org
>> http://qe-forge.org/mailman/listinfo/q-e-developers
>>
>>
>
>
> --
> Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
> Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
> Phone +39-0432-558216 <+39%200432%20558216>, fax +39-0432-558222
> <+39%200432%20558222>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20170712/e0dd9004/attachment.html>