[Q-e-developers] Improvements to the EXX Implementation

Tue Apr 5 16:15:29 CEST 2016

Dear Taylor,

first of all, great about having active contributors at LBNL!

One curiosity - did you have any interaction with Lin Lin? This
work seem intriguing: http://arxiv.org/pdf/1601.07159.pdf

			nicola

On 05/04/2016 06:05, Taylor Barnes wrote:
> Dear All,
>
>     I wanted to inform everyone about some improvements that we have
> been making at LBNL to the implementation of exact exchange in QE.
> These improvements have been made as part of NERSC's Exascale Science
> Applications Program, which is an effort to update codes for execution
> on next-generation architectures such as NERSC's upcoming Cori Phase
> II.  The following is a brief overview of these changes, which we are
> currently in the process of testing and debugging.  Depending on our
> progress, we intend to submit these changes as an addition to either QE
> 5.4 or 6.0.
>
> 1. Parallelization Over Band Pairs
>     We have extended the parallelization of subroutine vexx_k such that
> both of the loops over bands (i.e., "LOOP_ON_PSI_BANDS" and
> "IBND_LOOP_K") are parallelized with respect to band groups.  This
> improves load balancing, and also enables parallelization using larger
> numbers of band groups than was previously possible
>
> 2. Improved OMP Support
>     We have added OMP threading to numerous vector operations within
> exx.f90.  In addition, we have given special priority to enhancing the
> threaded performance of the FFTs.
>
> 3. Implementation of Different and Interchangeable Data Layouts for
> Local and EXX Portions of the Calculation
>     One observation that we have made is that for calculations that
> utilize many band groups, the local portion of the calculation (i.e.,
> everything outside of exx.f90) often represents a non-negligible (or
> even dominant) contribution to the total cost of the calculation.  This
> is largely because the local portion of the calculation is duplicated on
> each band group.  We have implemented changes to the code that allow the
> local portion of the code to be parallelized in a manner that is
> independent of the number of band groups, thus avoiding duplication of work.
>     This is the single most significant modification that we have made,
> both in terms of increasing the efficiency of QE, as well as the amount
> of coding work required.  For several test calculations we are finding
> that this change results in more than a factor of two speedup.
>     In terms of code development, the primary challenge of our approach
> is that when the EXX part of the calculation is performed (such as when
> vexx is called), we must change the data structure from the one that is
> used by the local portion of the code to a different data structure that
> is used by the EXX portion of the code.  This change of data structure
> requires a great deal of bookkeeping in order to update arrays like igk,
> ig_l2g, psi, and hpsi.  As a result, we a still the process of making
> our updated code compatible with gamma-point only calculations and with
> calculations that employ multiple k-points.
>
> Sincerely,
> Dr. Taylor Barnes
> Postdoctoral Scholar,
> Lawrence Berkeley National Laboratory
>
>
> _______________________________________________
> Q-e-developers mailing list
> Q-e-developers at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-developers
>

-- 
----------------------------------------------------------------------
Prof Nicola Marzari, Chair of Theory and Simulation of Materials, EPFL
Director, National Centre for Competence in Research NCCR MARVEL, EPFL
http://theossrv1.epfl.ch/Main/Contact http://nccr-marvel.ch/en/project