[Q-e-developers] Fwd: [Q-e-commits] r11428 - trunk/espresso/PW/src

Filippo Spiga spiga.filippo at gmail.com
Wed Mar 11 12:37:14 CET 2015


Hi Paolo,

I incorporated these changes in my codebase and compared the outputs against a reference set of outputs generated by test-code based on revision 11426. I used consistent environment (no MPI, no OpenMP, MKL serial, GCC 4.4.7).

Data is here: http://filippospiga.info/out.20150311_112145

F

Begin forwarded message:
> From: giannozz <giannozz at qeforge.qe-forge.org>
> Subject: [Q-e-commits] r11428 - trunk/espresso/PW/src
> Date: March 9, 2015 at 8:59:57 PM GMT
> To: q-e-commits at qe-forge.org
> Reply-To: Quantum ESPRESSO svn commit messages <q-e-commits at qe-forge.org>
> 
> Author: giannozz
> Date: 2015-03-09 21:59:57 +0100 (Mon, 09 Mar 2015)
> New Revision: 11428
> 
> Modified:
>   trunk/espresso/PW/src/addusdens.f90
>   trunk/espresso/PW/src/newd.f90
>   trunk/espresso/PW/src/print_clock_pw.f90
> Log:
> Speedup of ultrasoft PPs. 1) newd rewritten following Anton's suggestion with
> DGEMM's 2) addusdens rewritten in a more sensible way. Significant speedup, at
> the price of some sizable allocation of work memory. The latter should however
> not affect the overall memory requirement of a job. Please test.
> 
> 
> Modified: trunk/espresso/PW/src/addusdens.f90
> ===================================================================
> --- trunk/espresso/PW/src/addusdens.f90	2015-03-08 20:38:02 UTC (rev 11427)
> +++ trunk/espresso/PW/src/addusdens.f90	2015-03-09 20:59:57 UTC (rev 11428)
> @@ -1,5 +1,5 @@
> !
> -! Copyright (C) 2001-2013 Quantum ESPRESSO group
> +! Copyright (C) 2001-2015 Quantum ESPRESSO group
> ! This file is distributed under the terms of the
> ! GNU General Public License. See the file `License'
> ! in the root directory of the present distribution,
> @@ -61,16 +61,16 @@
>   !     here the local variables
>   !
> 
> -  integer :: ig, na, nt, ih, jh, ijh, is
> +  integer :: ig, na, nt, ih, jh, ijh, is, nab, nij
>   ! counters
> 
> -  real(DP) :: tbecsum(nspin_mag)  
> +  real(DP), allocatable :: tbecsum(:,:)
>   real(DP), allocatable :: qmod (:), ylmk0 (:,:)
>   ! the modulus of G
>   ! the spherical harmonics
> 
> -  complex(DP) :: skk
> -  complex(DP), allocatable ::  aux (:,:), qgm(:)
> +  complex(DP), allocatable :: skk(:), aux2(:,:)
> +  complex(DP), allocatable ::  aux (:,:), qgm(:,:)
>   ! work space for rho(G,nspin)
>   ! Fourier transform of q
> 
> @@ -80,64 +80,67 @@
> 
>   allocate (aux ( ngm, nspin_mag))    
>   allocate (qmod( ngm))    
> -  allocate (qgm( ngm))    
>   allocate (ylmk0( ngm, lmaxq * lmaxq))    
> +  ALLOCATE ( skk(ngm), aux2(ngm,nspin_mag) )
> 
>   aux (:,:) = (0.d0, 0.d0)
>   call ylmr2 (lmaxq * lmaxq, ngm, g, gg, ylmk0)
>   do ig = 1, ngm
>      qmod (ig) = sqrt (gg (ig) )
>   enddo
> +  !
>   do nt = 1, ntyp
>      if ( upf(nt)%tvanp ) then
> +        !
> +        ! nij = max number of (ih,jh) pairs per atom type nt
> +        !
> +        nij = nh(nt)*(nh(nt)+1)/2
> +        !
> +        allocate (qgm(ngm,nij), tbecsum(nij,nspin_mag) )
>         ijh = 0
>         do ih = 1, nh (nt)
>            do jh = ih, nh (nt)
> -#ifdef DEBUG_ADDUSDENS
> -  call start_clock ('addus:qvan2')
> -#endif
> -              call qvan2 (ngm, ih, jh, nt, qmod, qgm, ylmk0)
> -#ifdef DEBUG_ADDUSDENS
> -  call stop_clock ('addus:qvan2')
> -#endif
>               ijh = ijh + 1
> -              do na = 1, nat
> -                 if (ityp (na) .eq.nt) then
> -                    !
> -                    !  Multiply becsum and qg with the correct structure factor
> -                    tbecsum(1:nspin_mag) = becsum(ijh,na,1:nspin_mag)
> -                    !
> -#ifdef DEBUG_ADDUSDENS
> -  call start_clock ('addus:aux')
> -#endif
> -
> -                    do is = 1, nspin_mag
> -!$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(skk, ig)
> -                       do ig = 1, ngm
> -                          skk = eigts1 (mill (1,ig), na) * &
> -                                eigts2 (mill (2,ig), na) * &
> -                                eigts3 (mill (3,ig), na)
> -                          aux(ig,is)=aux(ig,is) + qgm(ig)*skk*tbecsum(is)
> -                       enddo
> -!$OMP END PARALLEL DO
> -                    enddo
> -
> -#ifdef DEBUG_ADDUSDENS
> -  call stop_clock ('addus:aux')
> -#endif
> -                 endif
> +              call qvan2 (ngm, ih, jh, nt, qmod, qgm(1,ijh), ylmk0)
> +           end do
> +        end do
> +        !
> +        do na = 1, nat
> +           IF ( ityp(na) == nt ) THEN
> +              !
> +              tbecsum(:,:) = becsum(1:nij,na,1:nspin_mag)
> +              !
> +!$omp parallel do default(shared) private(ig)
> +              do ig = 1, ngm
> +                 skk(ig) = eigts1 (mill (1,ig), na) * &
> +                           eigts2 (mill (2,ig), na) * &
> +                           eigts3 (mill (3,ig), na)
> +              end do
> +!$omp end parallel do
> +              CALL dgemm( 'N', 'N', 2*ngm, nspin_mag, nij, 1.0_dp, qgm, 2*ngm,&
> +                          tbecsum, nij, 0.0_dp, aux2, 2*ngm )
> +              do is = 1, nspin_mag
> +!$omp parallel do default(shared) private(ig)
> +                 do ig = 1, ngm
> +                    aux(ig,is)=aux(ig,is) + aux2(ig,is)*skk(ig)
> +                 enddo
> +!$omp end parallel do
>               enddo
> -           enddo
> +           endif
>         enddo
> +        deallocate (tbecsum, qgm)
>      endif
>   enddo
>   !
> +  deallocate (aux2, skk)
>   deallocate (ylmk0)
> -  deallocate (qgm)
>   deallocate (qmod)
>   !
>   !     convert aux to real space and add to the charge density
>   !
> +#ifdef DEBUG_ADDUSDENS
> +  call start_clock ('addus:fft')
> +#endif
>   do is = 1, nspin_mag
>      psic(:) = (0.d0, 0.d0)
>      psic( nl(:) ) = aux(:,is)
> @@ -145,6 +148,9 @@
>      CALL invfft ('Dense', psic, dfftp)
>      rho(:, is) = rho(:, is) +  DBLE (psic (:) )
>   enddo
> +#ifdef DEBUG_ADDUSDENS
> +  call stop_clock ('addus:fft')
> +#endif
>   deallocate (aux)
> 
>   call stop_clock ('addusdens')
> 
> Modified: trunk/espresso/PW/src/newd.f90
> ===================================================================
> --- trunk/espresso/PW/src/newd.f90	2015-03-08 20:38:02 UTC (rev 11427)
> +++ trunk/espresso/PW/src/newd.f90	2015-03-09 20:59:57 UTC (rev 11428)
> @@ -1,5 +1,5 @@
> !
> -! Copyright (C) 2001-2013 Quantum ESPRESSO group
> +! Copyright (C) 2001-2015 Quantum ESPRESSO group
> ! This file is distributed under the terms of the
> ! GNU General Public License. See the file `License'
> ! in the root directory of the present distribution,
> @@ -70,39 +70,28 @@
>   REAL(kind=dp), intent(out) :: deeq( nhm, nhm, nat, nspin )
>   LOGICAL, intent(in) :: skip_vltot !If .false. vltot is added to vr when necessary
>   ! INTERNAL
> -  INTEGER :: ig, nt, ih, jh, na, is, nht, nb, mb
> +  INTEGER :: ig, nt, ih, jh, na, is, ijh, nij, nb, nab
>   ! counters on g vectors, atom type, beta functions x 2,
>   !   atoms, spin, aux, aux, beta func x2 (again)
> -#ifdef __OPENMP
> -  INTEGER :: mytid, ntids, omp_get_thread_num, omp_get_num_threads
> -#endif
> -  COMPLEX(DP), ALLOCATABLE :: aux(:,:), qgm(:), qgm_na(:)
> +  COMPLEX(DP), ALLOCATABLE :: vaux(:,:), aux(:,:), qgm(:,:)
>     ! work space
> -  COMPLEX(DP) :: dtmp
> -  REAL(DP), ALLOCATABLE :: ylmk0(:,:), qmod(:)
> +  REAL(DP), ALLOCATABLE :: ylmk0(:,:), qmod(:), deeaux(:,:)
>     ! spherical harmonics, modulus of G
> -  REAL(DP) :: ddot
> -  INTEGER :: fact
> -
> +  REAL(DP) :: fact
> +  !
> +  CALL start_clock( 'newd' )
> +  !
>   IF ( gamma_only ) THEN
> -     !
> -     fact = 2
> -     !
> +     fact = 2.0_dp
>   ELSE
> -     !
> -     fact = 1
> -     !
> +     fact = 1.0_dp
>   END IF
>   !
> -  CALL start_clock( 'newd' )
> -  !
> -  ALLOCATE( aux( ngm, nspin_mag ),  &
> -            qgm( ngm ), qmod( ngm ), ylmk0( ngm, lmaxq*lmaxq ) )
> -  !
>   deeq(:,:,:,:) = 0.D0
>   !
> -  CALL ylmr2( lmaxq * lmaxq, ngm, g, gg, ylmk0 )
> +  ALLOCATE( vaux(ngm,nspin_mag), qmod( ngm ), ylmk0( ngm, lmaxq*lmaxq ) )
>   !
> +  CALL ylmr2( lmaxq * lmaxq, ngm, g, gg, ylmk0 )
>   qmod(1:ngm) = SQRT( gg(1:ngm) )
>   !
>   ! ... fourier transform of the total effective potential
> @@ -110,102 +99,106 @@
>   DO is = 1, nspin_mag
>      !
>      IF ( (nspin_mag == 4 .AND. is /= 1) .or. skip_vltot ) THEN 
> -        !
> -        psic(:) = vr(:,is)
> -        !
> +!$omp parallel do default(shared) private(ig)
> +        do ig=1,dfftp%nnr
> +           psic(ig) = vr(ig,is)
> +        end do
> +!$omp end parallel do
>      ELSE
> -        !
> -        psic(:) = vltot(:) + vr(:,is)
> -        !
> +!$omp parallel do default(shared) private(ig)
> +        do ig=1,dfftp%nnr
> +           psic(ig) = vltot(ig) + vr(ig,is)
> +        end do
> +!$omp end parallel do
>      END IF
> -     !
>      CALL fwfft ('Dense', psic, dfftp)
> +!$omp parallel do default(shared) private(ig)
> +        do ig=1,ngm
> +           vaux(ig, is) = psic(nl(ig))
> +        end do
> +!$omp end parallel do
>      !
> -     aux(1:ngm,is) = psic( nl(1:ngm) )
> -     !
>   END DO
> -  !
> -  ! ... here we compute the integral Q*V for each atom,
> -  ! ...       I = sum_G exp(-iR.G) Q_nm v^*
> -  !
> +
>   DO nt = 1, ntyp
>      !
>      IF ( upf(nt)%tvanp ) THEN
>         !
> +        ! nij = max number of (ih,jh) pairs per atom type nt
> +        !
> +        nij = nh(nt)*(nh(nt)+1)/2
> +        ALLOCATE ( qgm(ngm,nij) )
> +        !
> +        ! ... Compute and store Q(G) for this atomic species 
> +        ! ... (without structure factor)
> +        !
> +        ijh = 0
>         DO ih = 1, nh(nt)
> -           !
>            DO jh = ih, nh(nt)
> -              !
> -              ! ... The Q(r) for this atomic species without structure factor
> -              !
> -              CALL qvan2( ngm, ih, jh, nt, qmod, qgm, ylmk0 )
> -              !
> -#ifdef __OPENMP
> -!$omp parallel default(shared), private(na,qgm_na,is,dtmp,ig,mytid,ntids)
> -              mytid = omp_get_thread_num()  ! take the thread ID
> -              ntids = omp_get_num_threads() ! take the number of threads
> -#endif
> -              ALLOCATE(  qgm_na( ngm ) )
> -              !
> -              DO na = 1, nat
> -                 !
> -#ifdef __OPENMP
> -                 ! distribute atoms round robin to threads
> -                 !
> -                 IF( MOD( na, ntids ) /= mytid ) CYCLE
> -#endif
> -                 !
> -                 IF ( ityp(na) == nt ) THEN
> -                    !
> -                    ! ... The Q(r) for this specific atom
> -                    !
> -                    qgm_na(1:ngm) = qgm(1:ngm) * eigts1(mill(1,1:ngm),na) &
> -                                               * eigts2(mill(2,1:ngm),na) &
> -                                               * eigts3(mill(3,1:ngm),na)
> -                    !
> -                    ! ... and the product with the Q functions
> -                    !
> -                    DO is = 1, nspin_mag
> -                       !
> -#ifdef __OPENMP
> -                       dtmp = 0.0d0
> -                       DO ig = 1, ngm
> -                          dtmp = dtmp + aux( ig, is ) * CONJG( qgm_na( ig ) )
> -                       END DO
> -#else
> -                       dtmp = ddot( 2 * ngm, aux(1,is), 1, qgm_na, 1 )
> -#endif
> -                       deeq(ih,jh,na,is) = fact * omega * DBLE( dtmp )
> -                       !
> -                       IF ( gamma_only .AND. gstart == 2 ) &
> -                           deeq(ih,jh,na,is) = deeq(ih,jh,na,is) - &
> -                                           omega * DBLE( aux(1,is) * qgm_na(1) )
> -                       !
> -                       deeq(jh,ih,na,is) = deeq(ih,jh,na,is)
> -                       !
> +              ijh = ijh + 1
> +              CALL qvan2 ( ngm, ih, jh, nt, qmod, qgm(1,ijh), ylmk0 )
> +           END DO
> +        END DO
> +        !
> +        ! count max number of atoms of type nt
> +        !
> +        nab = 0
> +        DO na = 1, nat
> +           IF ( ityp(na) == nt ) nab = nab + 1
> +        END DO
> +        ALLOCATE ( aux (ngm, nab ), deeaux(nij, nab) )
> +        !
> +        ! ... Compute and store V(G) times the structure factor e^(-iG*tau)
> +        !
> +        DO is = 1, nspin_mag
> +           nb = 0
> +           DO na = 1, nat
> +              IF ( ityp(na) == nt ) THEN
> +                 nb = nb + 1
> +!$omp parallel do default(shared) private(ig)
> +                 do ig=1,ngm
> +                    aux(ig, nb) = vaux(ig,is) * CONJG ( &
> +                      eigts1(mill(1,ig),na) * &
> +                      eigts2(mill(2,ig),na) * &
> +                      eigts3(mill(3,ig),na) )
> +                 end do
> +!$omp end parallel do
> +              END IF
> +           END DO
> +           !
> +           ! ... here we compute the integral Q*V for all atoms of this kind
> +           !
> +           CALL DGEMM( 'C', 'N', nij, nab, 2*ngm, fact, qgm, 2*ngm, aux, &
> +                    2*ngm, 0.0_dp, deeaux, nij )
> +           IF ( gamma_only .AND. gstart == 2 ) &
> +                CALL DGER(nij, nab,-1.0_dp, qgm, 2*ngm, aux, 2*ngm, deeaux, nij)
> +           !
> +           nb = 0
> +           DO na = 1, nat
> +              IF ( ityp(na) == nt ) THEN
> +                 nb = nb + 1
> +                 ijh = 0
> +                 DO ih = 1, nh(nt)
> +                    DO jh = ih, nh(nt)
> +                       ijh = ijh + 1
> +                       deeq(ih,jh,na,is) = omega * deeaux(ijh,nb)
> +                       if (jh > ih) deeq(jh,ih,na,is) = deeq(ih,jh,na,is)
>                     END DO
> -                    !
> -                 END IF
> -                 !
> -              END DO
> -              !
> -              DEALLOCATE( qgm_na )
> -#ifdef __OPENMP
> -!$omp end parallel
> -#endif
> -              !
> +                 END DO
> +              END IF
>            END DO
>            !
>         END DO
>         !
> +        DEALLOCATE ( deeaux, aux, qgm )
> +        !
>      END IF
>      !
>   END DO
>   !
> +  DEALLOCATE( qmod, ylmk0, vaux )
>   CALL mp_sum( deeq( :, :, :, 1:nspin_mag ), intra_bgrp_comm )
>   !
> -  DEALLOCATE( aux, qgm, qmod, ylmk0 )
> -  !
> END SUBROUTINE newq_compute
> !---------------------------------------
> SUBROUTINE newd()
> 
> Modified: trunk/espresso/PW/src/print_clock_pw.f90
> ===================================================================
> --- trunk/espresso/PW/src/print_clock_pw.f90	2015-03-08 20:38:02 UTC (rev 11427)
> +++ trunk/espresso/PW/src/print_clock_pw.f90	2015-03-09 20:59:57 UTC (rev 11428)
> @@ -1,5 +1,5 @@
> !
> -! Copyright (C) 2001-2006 Quantum ESPRESSO group
> +! Copyright (C) 2001-2015 Quantum ESPRESSO group
> ! This file is distributed under the terms of the
> ! GNU General Public License. See the file `License'
> ! in the root directory of the present distribution,
> @@ -81,15 +81,11 @@
>       CALL print_clock( 'wfcrot' )
>    ENDIF
>    !
> -   IF ( iverbosity > 0)  THEN
> +   !IF ( iverbosity > 0)  THEN
>       WRITE( stdout, '(/5x,"Called by sum_band:")' )
>       CALL print_clock( 'sum_band:becsum' )
>       CALL print_clock( 'addusdens' )
> -      CALL print_clock( 'addus:qvan2' )
> -      CALL print_clock( 'addus:strf' )
> -      CALL print_clock( 'addus:aux2' )
> -      CALL print_clock( 'addus:aux' )
> -   ENDIF
> +   !ENDIF
>    !
>    IF ( isolve == 0 ) THEN
>       WRITE( stdout, '(/5x,"Called by *egterg:")' )
> 
> _______________________________________________
> Q-e-commits mailing list
> Q-e-commits at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-commits

--
Mr. Filippo SPIGA, M.Sc.
http://filippospiga.info ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert

*****
Disclaimer: "Please note this message and any attachments are CONFIDENTIAL and may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission."






More information about the developers mailing list