[Q-e-developers] Improvements to the EXX Implementation

Mon May 9 12:13:03 CEST 2016

Great. About the last question: option 2 seems to me by far the best, as
long as the new algorithm is consistently better than the old one and can
replace it.

About the format of the patch: I would go for a patch wrt the latest svn
(or git mirror)

Paolo

On Mon, May 9, 2016 at 4:21 AM, Taylor Barnes <tbarnes at lbl.gov> wrote:

> Hi All,
>
>    We've been continuing to refine our improvements to the exact exchange
> parts of the code.  I have attached some recent timings we obtained for a
> single-point energy SCF calculation on a box of 64 water molecules, running
> on NERSC's Edison system.  Our code parallelizes substantially better, and
> enables us to achieve walltimes that are about 5x better than the best
> walltimes we could obtain with the old code.
>
>    One thing that we are wondering about is the best way to go about
> incorporating these modifications with the current developer's code
> branch.  Should we submit our changes to this list in a patch, or do we
> need access to the git repository?
>
>    We are also interested in knowing what would be the minimum
> requirements for an initial patch.  In its current state, our code is not
> compatible with certain input options - the biggest examples being that our
> code does not support the gamma-point only and non-colinear options.
> Running an exact exchange calculation with either of these options will
> currently crash in our version of the code.  For the purpose of submitting
> an initial patch, I can see three different ways of handing this issue,
> which I describe below in increasing order of complexity:
>
>    1. Add error messages that prevent gamma-point only / non-colinear
> calculations from being used in combination with exact exchange, with the
> understanding that these features will be re-enabled in a future patch.
>    2. Modify the code such that exact exchange calculations using one of
> the non-supported options fall back on old versions of the subroutines that
> we have improved, with the understanding that support for these options in
> the updated subroutines will be added in a future patch.
>   3. Wait until our updated code fully supports all possible user input
> combinations before submitting a patch.
>
>    Which of these options would satisfy the minimum requirements for
> submission of a patch?  Our interest in this question is primarily
> motivated by a desire to see our modifications be incorporated into the
> code as soon as possible.
>
> Best Regards,
> Taylor Barnes
>
>
> On Mon, Apr 4, 2016 at 10:18 PM, Filippo Spiga <
> filippo.spiga at quantum-espresso.org> wrote:
>
>> Hello Taylor,
>>
>> thanks for sharing all your progresses of your work with us. If you want
>> these contributions to be merge in the v5.4 please remember to provide us a
>> patch or the relevant files by the weekend of April 23-24.
>>
>> Best Regards
>>
>> --
>> Filippo SPIGA
>> * Sent from my iPhone, sorry for typos *
>>
>> > On 4 Apr 2016, at 21:05, Taylor Barnes <tbarnes at lbl.gov> wrote:
>> >
>> > Dear All,
>> >
>> >    I wanted to inform everyone about some improvements that we have
>> been making at LBNL to the implementation of exact exchange in QE.  These
>> improvements have been made as part of NERSC's Exascale Science
>> Applications Program, which is an effort to update codes for execution on
>> next-generation architectures such as NERSC's upcoming Cori Phase II.  The
>> following is a brief overview of these changes, which we are currently in
>> the process of testing and debugging.  Depending on our progress, we intend
>> to submit these changes as an addition to either QE 5.4 or 6.0.
>> >
>> > 1. Parallelization Over Band Pairs
>> >    We have extended the parallelization of subroutine vexx_k such that
>> both of the loops over bands (i.e., "LOOP_ON_PSI_BANDS" and "IBND_LOOP_K")
>> are parallelized with respect to band groups.  This improves load
>> balancing, and also enables parallelization using larger numbers of band
>> groups than was previously possible
>> >
>> > 2. Improved OMP Support
>> >    We have added OMP threading to numerous vector operations within
>> exx.f90.  In addition, we have given special priority to enhancing the
>> threaded performance of the FFTs.
>> >
>> > 3. Implementation of Different and Interchangeable Data Layouts for
>> Local and EXX Portions of the Calculation
>> >    One observation that we have made is that for calculations that
>> utilize many band groups, the local portion of the calculation (i.e.,
>> everything outside of exx.f90) often represents a non-negligible (or even
>> dominant) contribution to the total cost of the calculation.  This is
>> largely because the local portion of the calculation is duplicated on each
>> band group.  We have implemented changes to the code that allow the local
>> portion of the code to be parallelized in a manner that is independent of
>> the number of band groups, thus avoiding duplication of work.
>> >    This is the single most significant modification that we have made,
>> both in terms of increasing the efficiency of QE, as well as the amount of
>> coding work required.  For several test calculations we are finding that
>> this change results in more than a factor of two speedup.
>> >    In terms of code development, the primary challenge of our approach
>> is that when the EXX part of the calculation is performed (such as when
>> vexx is called), we must change the data structure from the one that is
>> used by the local portion of the code to a different data structure that is
>> used by the EXX portion of the code.  This change of data structure
>> requires a great deal of bookkeeping in order to update arrays like igk,
>> ig_l2g, psi, and hpsi.  As a result, we a still the process of making our
>> updated code compatible with gamma-point only calculations and with
>> calculations that employ multiple k-points.
>> >
>> > Sincerely,
>> > Dr. Taylor Barnes
>> > Postdoctoral Scholar,
>> > Lawrence Berkeley National Laboratory
>> > _______________________________________________
>> > Q-e-developers mailing list
>> > Q-e-developers at qe-forge.org
>> > http://qe-forge.org/mailman/listinfo/q-e-developers
>>
>>
>> _______________________________________________
>> Q-e-developers mailing list
>> Q-e-developers at qe-forge.org
>> http://qe-forge.org/mailman/listinfo/q-e-developers
>>
>
>
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers
>
>

-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/developers/attachments/20160509/389fe95d/attachment.html>