[Q-e-developers] Improvements to the EXX Implementation

Taylor Barnes tbarnes at lbl.gov
Mon May 9 04:21:21 CEST 2016


Hi All,

   We've been continuing to refine our improvements to the exact exchange
parts of the code.  I have attached some recent timings we obtained for a
single-point energy SCF calculation on a box of 64 water molecules, running
on NERSC's Edison system.  Our code parallelizes substantially better, and
enables us to achieve walltimes that are about 5x better than the best
walltimes we could obtain with the old code.

   One thing that we are wondering about is the best way to go about
incorporating these modifications with the current developer's code
branch.  Should we submit our changes to this list in a patch, or do we
need access to the git repository?

   We are also interested in knowing what would be the minimum requirements
for an initial patch.  In its current state, our code is not compatible
with certain input options - the biggest examples being that our code does
not support the gamma-point only and non-colinear options.  Running an
exact exchange calculation with either of these options will currently
crash in our version of the code.  For the purpose of submitting an initial
patch, I can see three different ways of handing this issue, which I
describe below in increasing order of complexity:

   1. Add error messages that prevent gamma-point only / non-colinear
calculations from being used in combination with exact exchange, with the
understanding that these features will be re-enabled in a future patch.
   2. Modify the code such that exact exchange calculations using one of
the non-supported options fall back on old versions of the subroutines that
we have improved, with the understanding that support for these options in
the updated subroutines will be added in a future patch.
  3. Wait until our updated code fully supports all possible user input
combinations before submitting a patch.

   Which of these options would satisfy the minimum requirements for
submission of a patch?  Our interest in this question is primarily
motivated by a desire to see our modifications be incorporated into the
code as soon as possible.

Best Regards,
Taylor Barnes


On Mon, Apr 4, 2016 at 10:18 PM, Filippo Spiga <
filippo.spiga at quantum-espresso.org> wrote:

> Hello Taylor,
>
> thanks for sharing all your progresses of your work with us. If you want
> these contributions to be merge in the v5.4 please remember to provide us a
> patch or the relevant files by the weekend of April 23-24.
>
> Best Regards
>
> --
> Filippo SPIGA
> * Sent from my iPhone, sorry for typos *
>
> > On 4 Apr 2016, at 21:05, Taylor Barnes <tbarnes at lbl.gov> wrote:
> >
> > Dear All,
> >
> >    I wanted to inform everyone about some improvements that we have been
> making at LBNL to the implementation of exact exchange in QE.  These
> improvements have been made as part of NERSC's Exascale Science
> Applications Program, which is an effort to update codes for execution on
> next-generation architectures such as NERSC's upcoming Cori Phase II.  The
> following is a brief overview of these changes, which we are currently in
> the process of testing and debugging.  Depending on our progress, we intend
> to submit these changes as an addition to either QE 5.4 or 6.0.
> >
> > 1. Parallelization Over Band Pairs
> >    We have extended the parallelization of subroutine vexx_k such that
> both of the loops over bands (i.e., "LOOP_ON_PSI_BANDS" and "IBND_LOOP_K")
> are parallelized with respect to band groups.  This improves load
> balancing, and also enables parallelization using larger numbers of band
> groups than was previously possible
> >
> > 2. Improved OMP Support
> >    We have added OMP threading to numerous vector operations within
> exx.f90.  In addition, we have given special priority to enhancing the
> threaded performance of the FFTs.
> >
> > 3. Implementation of Different and Interchangeable Data Layouts for
> Local and EXX Portions of the Calculation
> >    One observation that we have made is that for calculations that
> utilize many band groups, the local portion of the calculation (i.e.,
> everything outside of exx.f90) often represents a non-negligible (or even
> dominant) contribution to the total cost of the calculation.  This is
> largely because the local portion of the calculation is duplicated on each
> band group.  We have implemented changes to the code that allow the local
> portion of the code to be parallelized in a manner that is independent of
> the number of band groups, thus avoiding duplication of work.
> >    This is the single most significant modification that we have made,
> both in terms of increasing the efficiency of QE, as well as the amount of
> coding work required.  For several test calculations we are finding that
> this change results in more than a factor of two speedup.
> >    In terms of code development, the primary challenge of our approach
> is that when the EXX part of the calculation is performed (such as when
> vexx is called), we must change the data structure from the one that is
> used by the local portion of the code to a different data structure that is
> used by the EXX portion of the code.  This change of data structure
> requires a great deal of bookkeeping in order to update arrays like igk,
> ig_l2g, psi, and hpsi.  As a result, we a still the process of making our
> updated code compatible with gamma-point only calculations and with
> calculations that employ multiple k-points.
> >
> > Sincerely,
> > Dr. Taylor Barnes
> > Postdoctoral Scholar,
> > Lawrence Berkeley National Laboratory
> > _______________________________________________
> > Q-e-developers mailing list
> > Q-e-developers at qe-forge.org
> > http://qe-forge.org/mailman/listinfo/q-e-developers
>
>
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20160508/888efa2e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 64water_timings.png
Type: image/png
Size: 99457 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20160508/888efa2e/attachment.png>


More information about the developers mailing list