[Q-e-developers] [Q-e-commits] r11407 - trunk/espresso/Modules

Paolo Giannozzi paolo.giannozzi at uniud.it
Tue Feb 24 16:01:28 CET 2015


On Tue, 2015-02-24 at 14:45 +0000, Filippo Spiga wrote:
> Then Intel compiler has a pre-processor macros that contains the
> compiler version called "__ICC". Just set it to be major of 1310 (so
> Intel 14.x minimum is required to use this compiler directive)
> 
> 
> #if defined(__ICC)
> #if __ICC  > 1310
> !dir$ attributes align: 4096 :: yf, aux
> #endif
> #endif

this seems to work (that is: my compiler doesn't crash any longer)
(it seems it has __ICC not defined, though).

P.

> It should fix the problem. I can apply and test it later today if you
> are ok. Carlo, opinions?
> 
> 
> F
> 
> 
> On Feb 24, 2015, at 6:42 AM, Paolo Giannozzi
> <paolo.giannozzi at uniud.it> wrote:
> > It's line 76:
> > !dir$ attributes align: 4096 :: yf, aux
> > that makes my compiler (v.12) cras.
> > 
> > P.
> > On Mon, 2015-02-23 at 22:12 +0000, Filippo Spiga wrote:
> > > Ciao Carlo,
> > > 
> > > I spot that for a old (how old Paolo G?) version of Intel compiler
> > > the fft_parallel.f90 files with your latest changes generate a
> > > internal compiler error. It happens with and without MPI support.
> > > 
> > > See: 
> > > - http://xiexie.syslab.disco.unimib.it:8010/builders/UDINE%20%
> > > 28HYBRID%29
> > > - http://xiexie.syslab.disco.unimib.it:8010/builders/UDINE%20%
> > > 28SERIAL%29/builds/4
> > > - http://xiexie.syslab.disco.unimib.it:8010/builders/UDINE%20%
> > > 28SERIAL%29/builds/4/steps/make%20all/logs/stdio
> > > 
> > > Did you test these changes using which version of Intel?
> > > 
> > > Cheers,
> > > Filippo
> > > 
> > > Begin forwarded message:
> > > > Date: February 23, 2015 at 11:14:16 AM GMT
> > > > From: ccavazzoni <ccavazzoni at qeforge.qe-forge.org>
> > > > To: q-e-commits at qe-forge.org
> > > > Subject: [Q-e-commits] r11407 - trunk/espresso/Modules
> > > > Reply-To: Quantum ESPRESSO svn commit messages
> > > > <q-e-commits at qe-forge.org>
> > > > 
> > > > Author: ccavazzoni
> > > > Date: 2015-02-23 12:14:15 +0100 (Mon, 23 Feb 2015)
> > > > New Revision: 11407
> > > > 
> > > > Modified:
> > > >  trunk/espresso/Modules/fft_parallel.f90
> > > >  trunk/espresso/Modules/fft_scalar.f90
> > > >  trunk/espresso/Modules/wavefunctions.f90
> > > > Log:
> > > > - adding memory alignment directives to have a performance
> > > > improvement on Intel architecure (CPU+Network),
> > > > only meaningful for intel compiler, shold be of no arm for all
> > > > the other
> > > > - Intel DFTI MKL fft interface back in again (with __DFTI) in
> > > > fft_scalar, some issues with the fftw3 interface
> > > > is prevending porting to Xeon PHI processor (no arm to all other
> > > > procs/fft)
> > > > - A split in thress different subs of the general driver
> > > > (tg_cft3s) is added to the fft_parallel module,
> > > > to support software pipelining optimizations, to mask
> > > > communication and data transfer latency.
> > > > 
> > > > 
> > > > Modified: trunk/espresso/Modules/fft_parallel.f90
> > > > ===================================================================
> > > > --- trunk/espresso/Modules/fft_parallel.f90 2015-02-23 10:58:36
> > > > UTC (rev 11406)
> > > > +++ trunk/espresso/Modules/fft_parallel.f90 2015-02-23 11:14:15
> > > > UTC (rev 11407)
> > > > @@ -73,6 +73,7 @@
> > > >  LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> > > >                                           ! specify if you want
> > > > to use task groups parallelization
> > > >  !
> > > > +!dir$ attributes align: 4096 :: yf, aux
> > > >  INTEGER                    :: me_p
> > > >  INTEGER                    :: n1, n2, n3, nx1, nx2, nx3
> > > >  COMPLEX(DP), ALLOCATABLE   :: yf(:), aux (:)
> > > > @@ -329,4 +330,418 @@
> > > >  !
> > > > END SUBROUTINE tg_cft3s
> > > > !
> > > > +!
> > > > +!
> > > > +!----------------------------------------------------------------------------
> > > > +SUBROUTINE tg_cft3s_z( f, dfft, aux, isgn, use_task_groups )
> > > > +
> > > >  !----------------------------------------------------------------------------
> > > > +  !
> > > > +  USE fft_scalar, ONLY : cft_1z, cft_2xy
> > > > +  USE fft_base,   ONLY : fft_scatter
> > > > +  USE kinds,      ONLY : DP
> > > > +  USE fft_types,  ONLY : fft_dlay_descriptor
> > > > +  USE parallel_include
> > > > +
> > > > +  !
> > > > +  IMPLICIT NONE
> > > > +  !
> > > > +  COMPLEX(DP), INTENT(inout)    :: f( : )  ! array containing
> > > > data to be transformed
> > > > +  COMPLEX(DP), INTENT(inout)   :: aux (:)
> > > > +  TYPE (fft_dlay_descriptor), INTENT(in) :: dfft
> > > > +                                           ! descriptor of fft
> > > > data layout
> > > > +  INTEGER, INTENT(in)           :: isgn    ! fft direction
> > > > +  LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> > > > +                                           ! specify if you
> > > > want to use task groups parallelization
> > > > +  !
> > > > +  INTEGER                    :: me_p
> > > > +  INTEGER                    :: n1, n2, n3, nx1, nx2, nx3
> > > > +  COMPLEX(DP), ALLOCATABLE   :: yf(:)
> > > > +  INTEGER                    :: planes( dfft%nr1x )
> > > > +  LOGICAL                    :: use_tg
> > > > +  !
> > > > +  !
> > > > +  IF( present( use_task_groups ) ) THEN
> > > > +     use_tg = use_task_groups
> > > > +  ELSE
> > > > +     use_tg = .false.
> > > > +  ENDIF
> > > > +  !
> > > > +  IF( use_tg .and. .not. dfft%have_task_groups ) &
> > > > +     CALL errore( ' tg_cft3s_x ', ' call requiring task groups
> > > > for a descriptor without task groups ', 1 )
> > > > +  !
> > > > +  n1  = dfft%nr1
> > > > +  n2  = dfft%nr2
> > > > +  n3  = dfft%nr3
> > > > +  nx1 = dfft%nr1x
> > > > +  nx2 = dfft%nr2x
> > > > +  nx3 = dfft%nr3x
> > > > +  !
> > > > +  IF( use_tg ) THEN
> > > > +     ALLOCATE( YF ( dfft%nogrp * dfft%tg_nnr ) )
> > > > +  ENDIF
> > > > +  !
> > > > +  me_p = dfft%mype + 1
> > > > +  !
> > > > +  IF ( isgn > 0 ) THEN
> > > > +     !
> > > > +     IF ( isgn /= 2 ) THEN
> > > > +        !
> > > > +        IF( use_tg ) &
> > > > +           CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > +        !
> > > > +        CALL cft_1z( f, dfft%nsp( me_p ), n3, nx3, isgn, aux )
> > > > +        !
> > > > +     ELSE
> > > > +        !
> > > > +        CALL pack_group_sticks()
> > > > +        !
> > > > +        IF( use_tg ) THEN
> > > > +           CALL cft_1z( yf, dfft%tg_nsw( me_p ), n3, nx3, isgn,
> > > > aux )
> > > > +        ELSE
> > > > +           CALL cft_1z( f, dfft%nsw( me_p ), n3, nx3, isgn, aux
> > > > )
> > > > +        ENDIF
> > > > +        !
> > > > +     ENDIF
> > > > +     !
> > > > +  ELSE
> > > > +     !
> > > > +     IF ( isgn /= -2 ) THEN
> > > > +        !
> > > > +        IF( use_tg ) &
> > > > +           CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > +        !
> > > > +     ENDIF
> > > > +
> > > > +     IF ( isgn /= -2 ) THEN
> > > > +        !
> > > > +        CALL cft_1z( aux, dfft%nsp( me_p ), n3, nx3, isgn, f )
> > > > +        !
> > > > +     ELSE
> > > > +        !
> > > > +        IF( use_tg ) THEN
> > > > +           CALL cft_1z( aux, dfft%tg_nsw( me_p ), n3, nx3,
> > > > isgn, yf )
> > > > +        ELSE
> > > > +           CALL cft_1z( aux, dfft%nsw( me_p ), n3, nx3, isgn, f
> > > > )
> > > > +        ENDIF
> > > > +        !
> > > > +        CALL unpack_group_sticks()
> > > > +        !
> > > > +     ENDIF
> > > > +     !
> > > > +  ENDIF
> > > > +  !
> > > > +  IF( use_tg ) THEN
> > > > +     DEALLOCATE( yf )
> > > > +  ENDIF
> > > > +  !
> > > > +  RETURN
> > > > +  !
> > > > +CONTAINS
> > > > +  !
> > > > +  SUBROUTINE pack_group_sticks()
> > > > +
> > > > +     INTEGER                     :: ierr
> > > > +     !
> > > > +     IF( .not. use_tg ) RETURN
> > > > +     !
> > > > +     IF( dfft%tg_rdsp(dfft%nogrp) + dfft%tg_rcv(dfft%nogrp) >
> > > > size( yf ) ) THEN
> > > > +        CALL errore( 'pack_group_sticks' , ' inconsistent size
> > > > ', 1 )
> > > > +     ENDIF
> > > > +     IF( dfft%tg_psdsp(dfft%nogrp) + dfft%tg_snd(dfft%nogrp) >
> > > > size( f ) ) THEN
> > > > +        CALL errore( 'pack_group_sticks', ' inconsistent size
> > > > ', 2 )
> > > > +     ENDIF
> > > > +
> > > > +     CALL start_clock( 'ALLTOALL' )
> > > > +     !
> > > > +     !  Collect all the sticks of the different states,
> > > > +     !  in "yf" processors will have all the sticks of the OGRP
> > > > +
> > > > +#if defined __MPI
> > > > +
> > > > +     CALL MPI_ALLTOALLV( f(1), dfft%tg_snd, dfft%tg_psdsp,
> > > > MPI_DOUBLE_COMPLEX, yf(1), dfft%tg_rcv, &
> > > > +      &                     dfft%tg_rdsp, MPI_DOUBLE_COMPLEX,
> > > > dfft%ogrp_comm, IERR)
> > > > +     IF( ierr /= 0 ) THEN
> > > > +        CALL errore( 'pack_group_sticks', ' alltoall error 1 ',
> > > > abs(ierr) )
> > > > +     ENDIF
> > > > +
> > > > +#endif
> > > > +
> > > > +     CALL stop_clock( 'ALLTOALL' )
> > > > +     !
> > > > +     !YF Contains all ( ~ NOGRP*dfft%nsw(me) ) Z-sticks
> > > > +     !
> > > > +     RETURN
> > > > +  END SUBROUTINE pack_group_sticks
> > > > +
> > > > +  !
> > > > +
> > > > +  SUBROUTINE unpack_group_sticks()
> > > > +     !
> > > > +     !  Bring pencils back to their original distribution
> > > > +     !
> > > > +     INTEGER                     :: ierr
> > > > +     !
> > > > +     IF( .not. use_tg ) RETURN
> > > > +     !
> > > > +     IF( dfft%tg_usdsp(dfft%nogrp) + dfft%tg_snd(dfft%nogrp) >
> > > > size( f ) ) THEN
> > > > +        CALL errore( 'unpack_group_sticks', ' inconsistent size
> > > > ', 3 )
> > > > +     ENDIF
> > > > +     IF( dfft%tg_rdsp(dfft%nogrp) + dfft%tg_rcv(dfft%nogrp) >
> > > > size( yf ) ) THEN
> > > > +        CALL errore( 'unpack_group_sticks', ' inconsistent size
> > > > ', 4 )
> > > > +     ENDIF
> > > > +
> > > > +     CALL start_clock( 'ALLTOALL' )
> > > > +
> > > > +#if defined __MPI
> > > > +     CALL MPI_Alltoallv( yf(1), &
> > > > +          dfft%tg_rcv, dfft%tg_rdsp, MPI_DOUBLE_COMPLEX, f(1),
> > > > &
> > > > +          dfft%tg_snd, dfft%tg_usdsp, MPI_DOUBLE_COMPLEX, dfft%
> > > > ogrp_comm, IERR)
> > > > +     IF( ierr /= 0 ) THEN
> > > > +        CALL errore( 'unpack_group_sticks', ' alltoall error 2
> > > > ', abs(ierr) )
> > > > +     ENDIF
> > > > +#endif
> > > > +
> > > > +     CALL stop_clock( 'ALLTOALL' )
> > > > +
> > > > +     RETURN
> > > > +  END SUBROUTINE unpack_group_sticks
> > > > +  !
> > > > +END SUBROUTINE tg_cft3s_z
> > > > +!
> > > > +!
> > > > +!----------------------------------------------------------------------------
> > > > +SUBROUTINE tg_cft3s_scatter( f, dfft, aux, isgn,
> > > > use_task_groups )
> > > > +
> > > >  !----------------------------------------------------------------------------
> > > > +  !
> > > > +  USE fft_scalar, ONLY : cft_1z, cft_2xy
> > > > +  USE fft_base,   ONLY : fft_scatter
> > > > +  USE kinds,      ONLY : DP
> > > > +  USE fft_types,  ONLY : fft_dlay_descriptor
> > > > +  USE parallel_include
> > > > +
> > > > +  !
> > > > +  IMPLICIT NONE
> > > > +  !
> > > > +  COMPLEX(DP), INTENT(inout)    :: f( : ), aux( : )  ! array
> > > > containing data to be transformed
> > > > +  TYPE (fft_dlay_descriptor), INTENT(in) :: dfft
> > > > +                                           ! descriptor of fft
> > > > data layout
> > > > +  INTEGER, INTENT(in)           :: isgn    ! fft direction
> > > > +  LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> > > > +                                           ! specify if you
> > > > want to use task groups parallelization
> > > > +  !
> > > > +  INTEGER                    :: me_p
> > > > +  INTEGER                    :: n1, n2, n3, nx1, nx2, nx3
> > > > +  INTEGER                    :: planes( dfft%nr1x )
> > > > +  LOGICAL                    :: use_tg
> > > > +  !
> > > > +  !
> > > > +  IF( present( use_task_groups ) ) THEN
> > > > +     use_tg = use_task_groups
> > > > +  ELSE
> > > > +     use_tg = .false.
> > > > +  ENDIF
> > > > +  !
> > > > +  IF( use_tg .and. .not. dfft%have_task_groups ) &
> > > > +     CALL errore( ' tg_cft3s ', ' call requiring task groups
> > > > for a descriptor without task groups ', 1 )
> > > > +  !
> > > > +  n1  = dfft%nr1
> > > > +  n2  = dfft%nr2
> > > > +  n3  = dfft%nr3
> > > > +  nx1 = dfft%nr1x
> > > > +  nx2 = dfft%nr2x
> > > > +  nx3 = dfft%nr3x
> > > > +  !
> > > > +  me_p = dfft%mype + 1
> > > > +  !
> > > > +  IF ( isgn > 0 ) THEN
> > > > +     !
> > > > +     IF ( isgn /= 2 ) THEN
> > > > +        !
> > > > +        IF( use_tg ) &
> > > > +           CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > +        !
> > > > +     ENDIF
> > > > +     !
> > > > +     CALL fw_scatter( isgn ) ! forwart scatter from stick to
> > > > planes
> > > > +     !
> > > > +  ELSE
> > > > +     !
> > > > +     IF ( isgn /= -2 ) THEN
> > > > +        !
> > > > +        IF( use_tg ) &
> > > > +           CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > +        !
> > > > +     ENDIF
> > > > +     !
> > > > +     CALL bw_scatter( isgn )
> > > > +     !
> > > > +  ENDIF
> > > > +  !
> > > > +  RETURN
> > > > +  !
> > > > +CONTAINS
> > > > +  !
> > > > +  SUBROUTINE fw_scatter( iopt )
> > > > +
> > > > +        !Transpose data for the 2-D FFT on the x-y plane
> > > > +        !
> > > > +        !NOGRP*dfft%nnr: The length of aux and f
> > > > +        !nr3x: The length of each Z-stick
> > > > +        !aux: input - output
> > > > +        !f: working space
> > > > +        !isgn: type of scatter
> > > > +        !dfft%nsw(me) holds the number of Z-sticks proc. me
> > > > has.
> > > > +        !dfft%npp: number of planes per processor
> > > > +        !
> > > > +     !
> > > > +     USE fft_base, ONLY: fft_scatter
> > > > +     !
> > > > +     INTEGER, INTENT(in) :: iopt
> > > > +     !
> > > > +     IF( iopt == 2 ) THEN
> > > > +        !
> > > > +        IF( use_tg ) THEN
> > > > +           !
> > > > +           CALL fft_scatter( dfft, aux, nx3, dfft%nogrp*dfft%
> > > > tg_nnr, f, dfft%tg_nsw, dfft%tg_npp, iopt, use_tg )
> > > > +           !
> > > > +        ELSE
> > > > +           !
> > > > +           CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%
> > > > nsw, dfft%npp, iopt )
> > > > +           !
> > > > +        ENDIF
> > > > +        !
> > > > +     ELSEIF( iopt == 1 ) THEN
> > > > +        !
> > > > +        CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%
> > > > nsp, dfft%npp, iopt )
> > > > +        !
> > > > +     ENDIF
> > > > +     !
> > > > +     RETURN
> > > > +  END SUBROUTINE fw_scatter
> > > > +
> > > > +  !
> > > > +
> > > > +  SUBROUTINE bw_scatter( iopt )
> > > > +     !
> > > > +     USE fft_base, ONLY: fft_scatter
> > > > +     !
> > > > +     INTEGER, INTENT(in) :: iopt
> > > > +     !
> > > > +     IF( iopt == -2 ) THEN
> > > > +        !
> > > > +        IF( use_tg ) THEN
> > > > +           !
> > > > +           CALL fft_scatter( dfft, aux, nx3, dfft%nogrp*dfft%
> > > > tg_nnr, f, dfft%tg_nsw, dfft%tg_npp, iopt, use_tg )
> > > > +           !
> > > > +        ELSE
> > > > +           !
> > > > +           CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%
> > > > nsw, dfft%npp, iopt )
> > > > +           !
> > > > +        ENDIF
> > > > +        !
> > > > +     ELSEIF( iopt == -1 ) THEN
> > > > +        !
> > > > +        CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%
> > > > nsp, dfft%npp, iopt )
> > > > +        !
> > > > +     ENDIF
> > > > +     !
> > > > +     RETURN
> > > > +  END SUBROUTINE bw_scatter
> > > > +  !
> > > > +END SUBROUTINE tg_cft3s_scatter
> > > > +!
> > > > +!----------------------------------------------------------------------------
> > > > +SUBROUTINE tg_cft3s_xy( f, dfft, aux, isgn, use_task_groups )
> > > > +
> > > >  !----------------------------------------------------------------------------
> > > > +  !
> > > > +  USE fft_scalar, ONLY : cft_1z, cft_2xy
> > > > +  USE fft_base,   ONLY : fft_scatter
> > > > +  USE kinds,      ONLY : DP
> > > > +  USE fft_types,  ONLY : fft_dlay_descriptor
> > > > +  USE parallel_include
> > > > +
> > > > +  !
> > > > +  IMPLICIT NONE
> > > > +  !
> > > > +  COMPLEX(DP), INTENT(inout)    :: f( : ), aux( : )  ! array
> > > > containing data to be transformed
> > > > +  TYPE (fft_dlay_descriptor), INTENT(in) :: dfft
> > > > +                                           ! descriptor of fft
> > > > data layout
> > > > +  INTEGER, INTENT(in)           :: isgn    ! fft direction
> > > > +  LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> > > > +                                           ! specify if you
> > > > want to use task groups parallelization
> > > > +  !
> > > > +  INTEGER                    :: me_p
> > > > +  INTEGER                    :: n1, n2, n3, nx1, nx2, nx3
> > > > +  COMPLEX(DP), ALLOCATABLE   :: yf(:)
> > > > +  INTEGER                    :: planes( dfft%nr1x )
> > > > +  LOGICAL                    :: use_tg
> > > > +  !
> > > > +  !
> > > > +  IF( present( use_task_groups ) ) THEN
> > > > +     use_tg = use_task_groups
> > > > +  ELSE
> > > > +     use_tg = .false.
> > > > +  ENDIF
> > > > +  !
> > > > +  IF( use_tg .and. .not. dfft%have_task_groups ) &
> > > > +     CALL errore( ' tg_cft3s ', ' call requiring task groups
> > > > for a descriptor without task groups ', 1 )
> > > > +  !
> > > > +  n1  = dfft%nr1
> > > > +  n2  = dfft%nr2
> > > > +  n3  = dfft%nr3
> > > > +  nx1 = dfft%nr1x
> > > > +  nx2 = dfft%nr2x
> > > > +  nx3 = dfft%nr3x
> > > > +  !
> > > > +  me_p = dfft%mype + 1
> > > > +  !
> > > > +  IF ( isgn > 0 ) THEN
> > > > +     !
> > > > +     IF ( isgn /= 2 ) THEN
> > > > +        !
> > > > +        IF( use_tg ) &
> > > > +           CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > +        !
> > > > +        planes = dfft%iplp
> > > > +        !
> > > > +     ELSE
> > > > +        !
> > > > +        planes = dfft%iplw
> > > > +        !
> > > > +     ENDIF
> > > > +     !
> > > > +     IF( use_tg ) THEN
> > > > +        CALL cft_2xy( f, dfft%tg_npp( me_p ), n1, n2, nx1, nx2,
> > > > isgn, planes )
> > > > +     ELSE
> > > > +        CALL cft_2xy( f, dfft%npp( me_p ), n1, n2, nx1, nx2,
> > > > isgn, planes )
> > > > +     ENDIF
> > > > +     !
> > > > +  ELSE
> > > > +     !
> > > > +     IF ( isgn /= -2 ) THEN
> > > > +        !
> > > > +        IF( use_tg ) &
> > > > +           CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > +        !
> > > > +        planes = dfft%iplp
> > > > +        !
> > > > +     ELSE
> > > > +        !
> > > > +        planes = dfft%iplw
> > > > +        !
> > > > +     ENDIF
> > > > +
> > > > +     IF( use_tg ) THEN
> > > > +        CALL cft_2xy( f, dfft%tg_npp( me_p ), n1, n2, nx1, nx2,
> > > > isgn, planes )
> > > > +     ELSE
> > > > +        CALL cft_2xy( f, dfft%npp( me_p ), n1, n2, nx1, nx2,
> > > > isgn, planes)
> > > > +     ENDIF
> > > > +     !
> > > > +  ENDIF
> > > > +  !
> > > > +  RETURN
> > > > +  !
> > > > +END SUBROUTINE tg_cft3s_xy
> > > > +!
> > > > +
> > > > END MODULE fft_parallel
> > > > 
> > > > Modified: trunk/espresso/Modules/fft_scalar.f90
> > > > ===================================================================
> > > > --- trunk/espresso/Modules/fft_scalar.f90 2015-02-23 10:58:36
> > > > UTC (rev 11406)
> > > > +++ trunk/espresso/Modules/fft_scalar.f90 2015-02-23 11:14:15
> > > > UTC (rev 11407)
> > > > @@ -16,11 +16,20 @@
> > > > !--------------------------------------------------------------------------!
> > > > 
> > > > #include "fft_defs.h"
> > > > +
> > > > +#if defined __DFTI
> > > > +#include "mkl_dfti.f90"
> > > > +#endif
> > > > +
> > > > !
> > > > =----------------------------------------------------------------------=!
> > > >   MODULE fft_scalar
> > > > !
> > > > =----------------------------------------------------------------------=!
> > > >       USE kinds
> > > > 
> > > > +#if defined __DFTI
> > > > +       USE MKL_DFTI ! -- this can be found int he MKL include
> > > > directory
> > > > +#endif
> > > > +
> > > >        IMPLICIT NONE
> > > >        SAVE
> > > > 
> > > > @@ -77,6 +86,12 @@
> > > > 
> > > > #endif
> > > > 
> > > > +#if defined __DFTI
> > > > +        TYPE dfti_descriptor_array
> > > > +           TYPE(DFTI_DESCRIPTOR), POINTER :: desc
> > > > +        END TYPE
> > > > +#endif
> > > > +
> > > > !
> > > > =----------------------------------------------------------------------=!
> > > >   CONTAINS
> > > > !
> > > > =----------------------------------------------------------------------=!
> > > > @@ -95,6 +110,10 @@
> > > > 
> > > >   SUBROUTINE cft_1z(c, nsl, nz, ldz, isign, cout)
> > > > 
> > > > +#if defined __DFTI
> > > > +     USE iso_c_binding
> > > > +#endif
> > > > +
> > > > !     driver routine for nsl 1d complex fft's of length nz
> > > > !     ldz >= nz is the distance between sequences to be
> > > > transformed
> > > > !     (ldz>nz is used on some architectures to reduce memory
> > > > conflicts)
> > > > @@ -142,6 +161,21 @@
> > > >     C_POINTER, SAVE :: fw_planz( ndims ) = 0
> > > >     C_POINTER, SAVE :: bw_planz( ndims ) = 0
> > > > 
> > > > +#elif defined __DFTI
> > > > +
> > > > +     !   Intel MKL native FFT driver
> > > > +
> > > > +     TYPE(DFTI_DESCRIPTOR_ARRAY), SAVE :: hand( ndims )
> > > > +     LOGICAL, SAVE :: dfti_first = .TRUE.
> > > > +     INTEGER :: dfti_status = 0
> > > > +     !
> > > > +     IF( dfti_first .EQ. .TRUE. ) THEN
> > > > +        DO ip = 1, ndims
> > > > +           hand(ip)%desc => NULL()
> > > > +        END DO
> > > > +        dfti_first = .FALSE.
> > > > +     END IF
> > > > +
> > > > #elif defined __ESSL || defined __LINUX_ESSL
> > > > 
> > > >     !   ESSL IBM library: see the ESSL manual for DCFT
> > > > @@ -207,7 +241,6 @@
> > > >       !   initialize a new one
> > > > 
> > > >       ! WRITE( stdout, fmt="('DEBUG cft_1z, reinitializing
> > > > tables ', I3)" ) icurrent
> > > > -
> > > > #if defined __FFTW
> > > > 
> > > >       IF( fw_planz( icurrent) /= 0 ) CALL
> > > > DESTROY_PLAN_1D( fw_planz( icurrent) )
> > > > @@ -227,6 +260,53 @@
> > > >       CALL dfftw_plan_many_dft( bw_planz( icurrent), 1, nz, nsl,
> > > > c, &
> > > >            (/SIZE(c)/), 1, ldz, cout, (/SIZE(cout)/), 1, ldz,
> > > > idir, FFTW_ESTIMATE)
> > > > 
> > > > +#elif defined __DFTI
> > > > +
> > > > +       if( ASSOCIATED( hand( icurrent )%desc ) ) THEN
> > > > +          dfti_status = DftiFreeDescriptor( hand( icurrent )%
> > > > desc )
> > > > +          IF( dfti_status /= 0) THEN
> > > > +             WRITE(*,*) "stopped in DftiFreeDescriptor",
> > > > dfti_status
> > > > +             STOP
> > > > +          ENDIF
> > > > +       END IF
> > > > +
> > > > +     dfti_status = DftiCreateDescriptor(hand( icurrent )%desc,
> > > > DFTI_DOUBLE, DFTI_COMPLEX, 1,nz)
> > > > +     IF(dfti_status /= 0) THEN
> > > > +        WRITE(*,*) "stopped in DftiCreateDescriptor",
> > > > dfti_status
> > > > +        STOP
> > > > +     ENDIF
> > > > +     dfti_status = DftiSetValue(hand( icurrent )%desc,
> > > > DFTI_NUMBER_OF_TRANSFORMS,nsl)
> > > > +     IF(dfti_status /= 0)THEN
> > > > +        WRITE(*,*) "stopped in DFTI_NUMBER_OF_TRANSFORMS",
> > > > dfti_status
> > > > +        STOP
> > > > +     ENDIF
> > > > +     dfti_status = DftiSetValue(hand( icurrent )%
> > > > desc,DFTI_INPUT_DISTANCE, ldz )
> > > > +     IF(dfti_status /= 0)THEN
> > > > +        WRITE(*,*) "stopped in DFTI_INPUT_DISTANCE",
> > > > dfti_status
> > > > +        STOP
> > > > +     ENDIF
> > > > +     dfti_status = DftiSetValue(hand( icurrent )%desc,
> > > > DFTI_PLACEMENT, DFTI_INPLACE)
> > > > +     IF(dfti_status /= 0)THEN
> > > > +        WRITE(*,*) "stopped in DFTI_PLACEMENT", dfti_status
> > > > +        STOP
> > > > +     ENDIF
> > > > +     tscale = 1.0_DP/nz
> > > > +     dfti_status = DftiSetValue( hand( icurrent )%desc,
> > > > DFTI_FORWARD_SCALE, tscale);
> > > > +     IF(dfti_status /= 0)THEN
> > > > +        WRITE(*,*) "stopped in DFTI_FORWARD_SCALE", dfti_status
> > > > +        STOP
> > > > +     ENDIF
> > > > +     dfti_status = DftiSetValue( hand( icurrent )%desc,
> > > > DFTI_BACKWARD_SCALE, DBLE(1) );
> > > > +     IF(dfti_status /= 0)THEN
> > > > +        WRITE(*,*) "stopped in DFTI_BACKWARD_SCALE",
> > > > dfti_status
> > > > +        STOP
> > > > +     ENDIF
> > > > +     dfti_status = DftiCommitDescriptor(hand( icurrent )%desc)
> > > > +     IF(dfti_status /= 0)THEN
> > > > +        WRITE(*,*) "stopped in DftiCommitDescriptor",
> > > > dfti_status
> > > > +        STOP
> > > > +     ENDIF
> > > > +
> > > > #elif defined __ESSL || defined __LINUX_ESSL
> > > > 
> > > >       tscale = 1.0_DP / nz
> > > > @@ -286,11 +366,9 @@
> > > >          CALL FFT_Z_STICK_SINGLE(fw_planz( ip), c(offset),
> > > > ldz_t)
> > > >       END DO
> > > > !$omp end do
> > > > +!$omp end parallel
> > > >       tscale = 1.0_DP / nz
> > > > -!$omp workshare
> > > >       cout( 1 : ldz * nsl ) = c( 1 : ldz * nsl ) * tscale
> > > > -!$omp end workshare
> > > > -!$omp end parallel
> > > >     ELSE IF (isign > 0) THEN
> > > > !$omp parallel default(none) private(tid,offset,i)
> > > > shared(c,isign,nsl,bw_planz,ip,cout,ldz) &
> > > > !$omp &        firstprivate(ldz_t)
> > > > @@ -330,6 +408,24 @@
> > > >        CALL dfftw_execute_dft( bw_planz( ip), c, cout)
> > > >     END IF
> > > > 
> > > > +#elif defined __DFTI
> > > > +
> > > > +     IF (isign < 0) THEN
> > > > +        dfti_status = DftiComputeForward(hand(ip)%desc, c )
> > > > +        cout( 1 : ldz * nsl ) = c( 1 : ldz * nsl )
> > > > +        IF(dfti_status /= 0) THEN
> > > > +           WRITE(*,*) "stopped in DftiComputeForward",
> > > > dfti_status
> > > > +           STOP
> > > > +        ENDIF
> > > > +     ELSE IF (isign > 0) THEN
> > > > +        dfti_status = DftiComputeBackward(hand(ip)%desc, c )
> > > > +        cout( 1 : ldz * nsl ) = c( 1 : ldz * nsl )
> > > > +        IF(dfti_status /= 0) THEN
> > > > +           WRITE(*,*) "stopped in DftiComputeBackward",
> > > > dfti_status
> > > > +           STOP
> > > > +        ENDIF
> > > > +     END IF
> > > > +
> > > > #elif defined __SCSL
> > > > 
> > > >     IF ( isign < 0 ) THEN
> > > > @@ -415,6 +511,10 @@
> > > > 
> > > >   SUBROUTINE cft_2xy(r, nzl, nx, ny, ldx, ldy, isign, pl2ix)
> > > > 
> > > > +#if defined __DFTI
> > > > +     USE iso_c_binding
> > > > +#endif
> > > > +
> > > > !     driver routine for nzl 2d complex fft's of lengths nx and
> > > > ny
> > > > !     input : r(ldx*ldy)  complex, transform is in-place
> > > > !     ldx >= nx, ldy >= ny are the physical dimensions of the
> > > > equivalent
> > > > @@ -448,11 +548,23 @@
> > > >     EXTERNAL :: omp_get_thread_num, omp_get_num_threads
> > > > #endif
> > > > 
> > > > -#if defined __FFTW || defined __FFTW3
> > > > +#if defined __DFTI
> > > > 
> > > > +     TYPE(DFTI_DESCRIPTOR_ARRAY), SAVE :: hand( ndims )
> > > > +     LOGICAL, SAVE :: dfti_first = .TRUE.
> > > > +     INTEGER :: dfti_status = 0
> > > > +
> > > > +#elif defined __FFTW || defined __FFTW3
> > > > +
> > > > +#if defined __FFTW && __FFTW_ALL_XY_PLANES
> > > > +     C_POINTER, SAVE :: fw_plan_2d( ndims ) = 0
> > > > +     C_POINTER, SAVE :: bw_plan_2d( ndims ) = 0
> > > > +#else
> > > >     C_POINTER, SAVE :: fw_plan( 2, ndims ) = 0
> > > >     C_POINTER, SAVE :: bw_plan( 2, ndims ) = 0
> > > > +#endif
> > > > 
> > > > +
> > > > #elif defined __ESSL || defined __LINUX_ESSL
> > > > 
> > > >     INTEGER, PARAMETER :: ltabl = 20000 + 3 * nfftx
> > > > @@ -503,6 +615,14 @@
> > > >     !
> > > >     !   Here initialize table only if necessary
> > > >     !
> > > > +#if defined __DFTI
> > > > +     IF( dfti_first .EQ. .TRUE. ) THEN
> > > > +        DO ip = 1, ndims
> > > > +           hand(ip)%desc => NULL()
> > > > +        END DO
> > > > +        dfti_first = .FALSE.
> > > > +     END IF
> > > > +#endif
> > > > 
> > > >     DO ip = 1, ndims
> > > > 
> > > > @@ -524,8 +644,62 @@
> > > > 
> > > >       ! WRITE( stdout, fmt="('DEBUG cft_2xy, reinitializing
> > > > tables ', I3)" ) icurrent
> > > > 
> > > > -#if defined __FFTW
> > > > +#if defined __DFTI
> > > > 
> > > > +       if( ASSOCIATED( hand( icurrent )%desc ) ) THEN
> > > > +          dfti_status = DftiFreeDescriptor( hand( icurrent )%
> > > > desc )
> > > > +          IF( dfti_status /= 0) THEN
> > > > +             WRITE(*,*) "stopped in DftiFreeDescriptor",
> > > > dfti_status
> > > > +             STOP
> > > > +          ENDIF
> > > > +       END IF
> > > > +
> > > > +       dfti_status = DftiCreateDescriptor(hand( icurrent )%
> > > > desc, DFTI_DOUBLE, DFTI_COMPLEX, 2,(/nx,ny/))
> > > > +       IF(dfti_status /= 0) THEN
> > > > +          WRITE(*,*) "stopped in DftiCreateDescriptor",
> > > > dfti_status
> > > > +          STOP
> > > > +       ENDIF
> > > > +       dfti_status = DftiSetValue(hand( icurrent )%desc,
> > > > DFTI_NUMBER_OF_TRANSFORMS,nzl)
> > > > +       IF(dfti_status /= 0)THEN
> > > > +          WRITE(*,*) "stopped in DFTI_NUMBER_OF_TRANSFORMS",
> > > > dfti_status
> > > > +          STOP
> > > > +       ENDIF
> > > > +       dfti_status = DftiSetValue(hand( icurrent )%
> > > > desc,DFTI_INPUT_DISTANCE, ldx*ldy )
> > > > +       IF(dfti_status /= 0)THEN
> > > > +          WRITE(*,*) "stopped in DFTI_INPUT_DISTANCE",
> > > > dfti_status
> > > > +          STOP
> > > > +       ENDIF
> > > > +       dfti_status = DftiSetValue(hand( icurrent )%desc,
> > > > DFTI_PLACEMENT, DFTI_INPLACE)
> > > > +       IF(dfti_status /= 0)THEN
> > > > +          WRITE(*,*) "stopped in DFTI_PLACEMENT", dfti_status
> > > > +          STOP
> > > > +       ENDIF
> > > > +       tscale = 1.0_DP/ (nx * ny )
> > > > +       dfti_status = DftiSetValue( hand( icurrent )%desc,
> > > > DFTI_FORWARD_SCALE, tscale);
> > > > +       IF(dfti_status /= 0)THEN
> > > > +          WRITE(*,*) "stopped in DFTI_FORWARD_SCALE",
> > > > dfti_status
> > > > +          STOP
> > > > +       ENDIF
> > > > +       dfti_status = DftiSetValue( hand( icurrent )%desc,
> > > > DFTI_BACKWARD_SCALE, DBLE(1) );
> > > > +       IF(dfti_status /= 0)THEN
> > > > +          WRITE(*,*) "stopped in DFTI_BACKWARD_SCALE",
> > > > dfti_status
> > > > +          STOP
> > > > +       ENDIF
> > > > +       dfti_status = DftiCommitDescriptor(hand( icurrent )%
> > > > desc)
> > > > +       IF(dfti_status /= 0)THEN
> > > > +          WRITE(*,*) "stopped in DftiCommitDescriptor",
> > > > dfti_status
> > > > +          STOP
> > > > +       ENDIF
> > > > +
> > > > +
> > > > +#elif defined __FFTW
> > > > +
> > > > +#if defined __FFTW_ALL_XY_PLANES
> > > > +       IF( fw_plan_2d( icurrent) /= 0 )  CALL
> > > > DESTROY_PLAN_2D(fw_plan_2d(icurrent) )
> > > > +       IF( bw_plan_2d( icurrent) /= 0 )  CALL
> > > > DESTROY_PLAN_2D(bw_plan_2d(icurrent) )
> > > > +       idir = -1; CALL CREATE_PLAN_2D( fw_plan_2d(icurrent),
> > > > nx, ny, idir)
> > > > +       idir =  1; CALL CREATE_PLAN_2D( bw_plan_2d(icurrent),
> > > > nx, ny, idir)
> > > > +#else
> > > >       IF( fw_plan( 2,icurrent) /= 0 )   CALL
> > > > DESTROY_PLAN_1D( fw_plan( 2,icurrent) )
> > > >       IF( bw_plan( 2,icurrent) /= 0 )   CALL
> > > > DESTROY_PLAN_1D( bw_plan( 2,icurrent) )
> > > >       idir = -1; CALL CREATE_PLAN_1D( fw_plan( 2,icurrent), ny,
> > > > idir)
> > > > @@ -535,6 +709,7 @@
> > > >       IF( bw_plan( 1,icurrent) /= 0 ) CALL
> > > > DESTROY_PLAN_1D( bw_plan( 1,icurrent) )
> > > >       idir = -1; CALL CREATE_PLAN_1D( fw_plan( 1,icurrent), nx,
> > > > idir)
> > > >       idir =  1; CALL CREATE_PLAN_1D( bw_plan( 1,icurrent), nx,
> > > > idir)
> > > > +#endif
> > > > 
> > > > #elif defined __FFTW3
> > > > 
> > > > @@ -645,10 +820,46 @@
> > > > #endif
> > > > 
> > > > 
> > > > -#if defined __FFTW
> > > > +#if defined __DFTI
> > > > 
> > > > -#if defined __OPENMP
> > > > +     IF( isign < 0 ) THEN
> > > > +        !
> > > > +        dfti_status = DftiComputeForward(hand(ip)%desc, r(:))
> > > > +        IF(dfti_status /= 0)THEN
> > > > +           WRITE(*,*) "stopped in DftiComputeForward",
> > > > dfti_status
> > > > +           STOP
> > > > +        ENDIF
> > > > +        !
> > > > +     ELSE IF( isign > 0 ) THEN
> > > > +        !
> > > > +        dfti_status = DftiComputeBackward(hand(ip)%desc, r(:))
> > > > +        IF(dfti_status /= 0)THEN
> > > > +           WRITE(*,*) "stopped in DftiComputeBackward",
> > > > dfti_status
> > > > +           STOP
> > > > +        ENDIF
> > > > +        !
> > > > +     END IF
> > > > 
> > > > +
> > > > +#elif defined __FFTW
> > > > +
> > > > +#if defined __FFTW_ALL_XY_PLANES
> > > > +
> > > > +     IF( isign < 0 ) THEN
> > > > +        !
> > > > +        tscale = 1.0_DP / ( nx * ny )
> > > > +        !
> > > > +        CALL fftw_inplace_drv_2d( fw_plan_2d(ip), nzl, r(1), 1,
> > > > ldx*ldy )
> > > > +        CALL ZDSCAL( ldx * ldy * nzl, tscale, r(1), 1)
> > > > +        !
> > > > +     ELSE IF( isign > 0 ) THEN
> > > > +        !
> > > > +        CALL fftw_inplace_drv_2d( bw_plan_2d(ip), nzl, r(1), 1,
> > > > ldx*ldy )
> > > > +        !
> > > > +     END IF
> > > > +
> > > > +#elif defined __OPENMP
> > > > +
> > > >     nx_t  = nx
> > > >     ny_t  = ny
> > > >     nzl_t = nzl
> > > > @@ -1980,6 +2191,7 @@
> > > >  !
> > > >  if (mod (n, 2) ==0) nx = n + 1
> > > >  ! for nec vector machines: if n is even increase dimension by 1
> > > > +  !
> > > > #endif
> > > >  !
> > > >  good_fft_dimension = nx
> > > > 
> > > > Modified: trunk/espresso/Modules/wavefunctions.f90
> > > > ===================================================================
> > > > --- trunk/espresso/Modules/wavefunctions.f90 2015-02-23 10:58:36
> > > > UTC (rev 11406)
> > > > +++ trunk/espresso/Modules/wavefunctions.f90 2015-02-23 11:14:15
> > > > UTC (rev 11407)
> > > > @@ -28,6 +28,7 @@
> > > >     ! electronic wave functions, CPV code
> > > >     ! distributed over gvector and bands
> > > >     !
> > > > +!dir$ attributes align: 4096 :: c0_bgrp, cm_bgrp, phi_bgrp
> > > >     COMPLEX(DP), ALLOCATABLE :: c0_bgrp(:,:)  ! wave functions
> > > > at time t
> > > >     COMPLEX(DP), ALLOCATABLE :: cm_bgrp(:,:)  ! wave functions
> > > > at time t-delta t
> > > >     COMPLEX(DP), ALLOCATABLE :: phi_bgrp(:,:) ! |phi> = s'|c0> =
> > > > |c0> + sum q_ij |i><j|c0>
> > > > 
> > > > _______________________________________________
> > > > Q-e-commits mailing list
> > > > Q-e-commits at qe-forge.org
> > > > http://qe-forge.org/mailman/listinfo/q-e-commits
> > > 
> > > --
> > > Mr. Filippo SPIGA, M.Sc.
> > > http://filippospiga.info ~ skype: filippo.spiga
> > > 
> > > «Nobody will drive us out of Cantor's paradise.» ~ David Hilbert
> > > 
> > > *****
> > > Disclaimer: "Please note this message and any attachments are
> > > CONFIDENTIAL and may be privileged or otherwise protected from
> > > disclosure. The contents are not to be disclosed to anyone other
> > > than the addressee. Unauthorized recipients are requested to
> > > preserve this confidentiality and to advise the sender immediately
> > > of any error in transmission."
> > > 
> > > 
> > 
> > 
> 
> --
> Mr. Filippo SPIGA, M.Sc.
> http://filippospiga.info ~ skype: filippo.spiga
> 
> «Nobody will drive us out of Cantor's paradise.» ~ David Hilbert
> 
> *****
> Disclaimer: "Please note this message and any attachments are
> CONFIDENTIAL and may be privileged or otherwise protected from
> disclosure. The contents are not to be disclosed to anyone other than
> the addressee. Unauthorized recipients are requested to preserve this
> confidentiality and to advise the sender immediately of any error in
> transmission."
> 
> 
> 
> 

-- 
 Paolo Giannozzi, Dept. Chemistry&Physics&Environment, 
 Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
 Phone +39-0432-558216, fax +39-0432-558222 




More information about the developers mailing list