[Q-e-developers] Fwd: [Q-e-commits] r11407 - trunk/espresso/Modules

Filippo Spiga spiga.filippo at gmail.com
Mon Feb 23 23:12:56 CET 2015


Ciao Carlo,

I spot that for a old (how old Paolo G?) version of Intel compiler the fft_parallel.f90 files with your latest changes generate a internal compiler error. It happens with and without MPI support.

See: 
- http://xiexie.syslab.disco.unimib.it:8010/builders/UDINE%20%28HYBRID%29
- http://xiexie.syslab.disco.unimib.it:8010/builders/UDINE%20%28SERIAL%29/builds/4
- http://xiexie.syslab.disco.unimib.it:8010/builders/UDINE%20%28SERIAL%29/builds/4/steps/make%20all/logs/stdio

Did you test these changes using which version of Intel?

Cheers,
Filippo

Begin forwarded message:
> Date: February 23, 2015 at 11:14:16 AM GMT
> From: ccavazzoni <ccavazzoni at qeforge.qe-forge.org>
> To: q-e-commits at qe-forge.org
> Subject: [Q-e-commits] r11407 - trunk/espresso/Modules
> Reply-To: Quantum ESPRESSO svn commit messages <q-e-commits at qe-forge.org>
> 
> Author: ccavazzoni
> Date: 2015-02-23 12:14:15 +0100 (Mon, 23 Feb 2015)
> New Revision: 11407
> 
> Modified:
>   trunk/espresso/Modules/fft_parallel.f90
>   trunk/espresso/Modules/fft_scalar.f90
>   trunk/espresso/Modules/wavefunctions.f90
> Log:
> - adding memory alignment directives to have a performance improvement on Intel architecure (CPU+Network),
>  only meaningful for intel compiler, shold be of no arm for all the other
> - Intel DFTI MKL fft interface back in again (with __DFTI) in fft_scalar, some issues with the fftw3 interface
>  is prevending porting to Xeon PHI processor (no arm to all other procs/fft)
> - A split in thress different subs of the general driver (tg_cft3s) is added to the fft_parallel module,
>  to support software pipelining optimizations, to mask communication and data transfer latency.
> 
> 
> Modified: trunk/espresso/Modules/fft_parallel.f90
> ===================================================================
> --- trunk/espresso/Modules/fft_parallel.f90	2015-02-23 10:58:36 UTC (rev 11406)
> +++ trunk/espresso/Modules/fft_parallel.f90	2015-02-23 11:14:15 UTC (rev 11407)
> @@ -73,6 +73,7 @@
>   LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
>                                            ! specify if you want to use task groups parallelization
>   !
> +!dir$ attributes align: 4096 :: yf, aux
>   INTEGER                    :: me_p
>   INTEGER                    :: n1, n2, n3, nx1, nx2, nx3
>   COMPLEX(DP), ALLOCATABLE   :: yf(:), aux (:)
> @@ -329,4 +330,418 @@
>   !
> END SUBROUTINE tg_cft3s
> !
> +!
> +!
> +!----------------------------------------------------------------------------
> +SUBROUTINE tg_cft3s_z( f, dfft, aux, isgn, use_task_groups )
> +  !----------------------------------------------------------------------------
> +  !
> +  USE fft_scalar, ONLY : cft_1z, cft_2xy
> +  USE fft_base,   ONLY : fft_scatter
> +  USE kinds,      ONLY : DP
> +  USE fft_types,  ONLY : fft_dlay_descriptor
> +  USE parallel_include
> +
> +  !
> +  IMPLICIT NONE
> +  !
> +  COMPLEX(DP), INTENT(inout)    :: f( : )  ! array containing data to be transformed
> +  COMPLEX(DP), INTENT(inout)   :: aux (:)
> +  TYPE (fft_dlay_descriptor), INTENT(in) :: dfft
> +                                           ! descriptor of fft data layout
> +  INTEGER, INTENT(in)           :: isgn    ! fft direction
> +  LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> +                                           ! specify if you want to use task groups parallelization
> +  !
> +  INTEGER                    :: me_p
> +  INTEGER                    :: n1, n2, n3, nx1, nx2, nx3
> +  COMPLEX(DP), ALLOCATABLE   :: yf(:)
> +  INTEGER                    :: planes( dfft%nr1x )
> +  LOGICAL                    :: use_tg
> +  !
> +  !
> +  IF( present( use_task_groups ) ) THEN
> +     use_tg = use_task_groups
> +  ELSE
> +     use_tg = .false.
> +  ENDIF
> +  !
> +  IF( use_tg .and. .not. dfft%have_task_groups ) &
> +     CALL errore( ' tg_cft3s_x ', ' call requiring task groups for a descriptor without task groups ', 1 )
> +  !
> +  n1  = dfft%nr1
> +  n2  = dfft%nr2
> +  n3  = dfft%nr3
> +  nx1 = dfft%nr1x
> +  nx2 = dfft%nr2x
> +  nx3 = dfft%nr3x
> +  !
> +  IF( use_tg ) THEN
> +     ALLOCATE( YF ( dfft%nogrp * dfft%tg_nnr ) )
> +  ENDIF
> +  !
> +  me_p = dfft%mype + 1
> +  !
> +  IF ( isgn > 0 ) THEN
> +     !
> +     IF ( isgn /= 2 ) THEN
> +        !
> +        IF( use_tg ) &
> +           CALL errore( ' tg_cft3s ', ' task groups on large mesh not implemented ', 1 )
> +        !
> +        CALL cft_1z( f, dfft%nsp( me_p ), n3, nx3, isgn, aux )
> +        !
> +     ELSE
> +        !
> +        CALL pack_group_sticks()
> +        !
> +        IF( use_tg ) THEN
> +           CALL cft_1z( yf, dfft%tg_nsw( me_p ), n3, nx3, isgn, aux )
> +        ELSE
> +           CALL cft_1z( f, dfft%nsw( me_p ), n3, nx3, isgn, aux )
> +        ENDIF
> +        !
> +     ENDIF
> +     !
> +  ELSE
> +     !
> +     IF ( isgn /= -2 ) THEN
> +        !
> +        IF( use_tg ) &
> +           CALL errore( ' tg_cft3s ', ' task groups on large mesh not implemented ', 1 )
> +        !
> +     ENDIF
> +
> +     IF ( isgn /= -2 ) THEN
> +        !
> +        CALL cft_1z( aux, dfft%nsp( me_p ), n3, nx3, isgn, f )
> +        !
> +     ELSE
> +        !
> +        IF( use_tg ) THEN
> +           CALL cft_1z( aux, dfft%tg_nsw( me_p ), n3, nx3, isgn, yf )
> +        ELSE
> +           CALL cft_1z( aux, dfft%nsw( me_p ), n3, nx3, isgn, f )
> +        ENDIF
> +        !
> +        CALL unpack_group_sticks()
> +        !
> +     ENDIF
> +     !
> +  ENDIF
> +  !
> +  IF( use_tg ) THEN
> +     DEALLOCATE( yf )
> +  ENDIF
> +  !
> +  RETURN
> +  !
> +CONTAINS
> +  !
> +  SUBROUTINE pack_group_sticks()
> +
> +     INTEGER                     :: ierr
> +     !
> +     IF( .not. use_tg ) RETURN
> +     !
> +     IF( dfft%tg_rdsp(dfft%nogrp) + dfft%tg_rcv(dfft%nogrp) > size( yf ) ) THEN
> +        CALL errore( 'pack_group_sticks' , ' inconsistent size ', 1 )
> +     ENDIF
> +     IF( dfft%tg_psdsp(dfft%nogrp) + dfft%tg_snd(dfft%nogrp) > size( f ) ) THEN
> +        CALL errore( 'pack_group_sticks', ' inconsistent size ', 2 )
> +     ENDIF
> +
> +     CALL start_clock( 'ALLTOALL' )
> +     !
> +     !  Collect all the sticks of the different states,
> +     !  in "yf" processors will have all the sticks of the OGRP
> +
> +#if defined __MPI
> +
> +     CALL MPI_ALLTOALLV( f(1), dfft%tg_snd, dfft%tg_psdsp, MPI_DOUBLE_COMPLEX, yf(1), dfft%tg_rcv, &
> +      &                     dfft%tg_rdsp, MPI_DOUBLE_COMPLEX, dfft%ogrp_comm, IERR)
> +     IF( ierr /= 0 ) THEN
> +        CALL errore( 'pack_group_sticks', ' alltoall error 1 ', abs(ierr) )
> +     ENDIF
> +
> +#endif
> +
> +     CALL stop_clock( 'ALLTOALL' )
> +     !
> +     !YF Contains all ( ~ NOGRP*dfft%nsw(me) ) Z-sticks
> +     !
> +     RETURN
> +  END SUBROUTINE pack_group_sticks
> +
> +  !
> +
> +  SUBROUTINE unpack_group_sticks()
> +     !
> +     !  Bring pencils back to their original distribution
> +     !
> +     INTEGER                     :: ierr
> +     !
> +     IF( .not. use_tg ) RETURN
> +     !
> +     IF( dfft%tg_usdsp(dfft%nogrp) + dfft%tg_snd(dfft%nogrp) > size( f ) ) THEN
> +        CALL errore( 'unpack_group_sticks', ' inconsistent size ', 3 )
> +     ENDIF
> +     IF( dfft%tg_rdsp(dfft%nogrp) + dfft%tg_rcv(dfft%nogrp) > size( yf ) ) THEN
> +        CALL errore( 'unpack_group_sticks', ' inconsistent size ', 4 )
> +     ENDIF
> +
> +     CALL start_clock( 'ALLTOALL' )
> +
> +#if defined __MPI
> +     CALL MPI_Alltoallv( yf(1), &
> +          dfft%tg_rcv, dfft%tg_rdsp, MPI_DOUBLE_COMPLEX, f(1), &
> +          dfft%tg_snd, dfft%tg_usdsp, MPI_DOUBLE_COMPLEX, dfft%ogrp_comm, IERR)
> +     IF( ierr /= 0 ) THEN
> +        CALL errore( 'unpack_group_sticks', ' alltoall error 2 ', abs(ierr) )
> +     ENDIF
> +#endif
> +
> +     CALL stop_clock( 'ALLTOALL' )
> +
> +     RETURN
> +  END SUBROUTINE unpack_group_sticks
> +  !
> +END SUBROUTINE tg_cft3s_z
> +!
> +!
> +!----------------------------------------------------------------------------
> +SUBROUTINE tg_cft3s_scatter( f, dfft, aux, isgn, use_task_groups )
> +  !----------------------------------------------------------------------------
> +  !
> +  USE fft_scalar, ONLY : cft_1z, cft_2xy
> +  USE fft_base,   ONLY : fft_scatter
> +  USE kinds,      ONLY : DP
> +  USE fft_types,  ONLY : fft_dlay_descriptor
> +  USE parallel_include
> +
> +  !
> +  IMPLICIT NONE
> +  !
> +  COMPLEX(DP), INTENT(inout)    :: f( : ), aux( : )  ! array containing data to be transformed
> +  TYPE (fft_dlay_descriptor), INTENT(in) :: dfft
> +                                           ! descriptor of fft data layout
> +  INTEGER, INTENT(in)           :: isgn    ! fft direction
> +  LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> +                                           ! specify if you want to use task groups parallelization
> +  !
> +  INTEGER                    :: me_p
> +  INTEGER                    :: n1, n2, n3, nx1, nx2, nx3
> +  INTEGER                    :: planes( dfft%nr1x )
> +  LOGICAL                    :: use_tg
> +  !
> +  !
> +  IF( present( use_task_groups ) ) THEN
> +     use_tg = use_task_groups
> +  ELSE
> +     use_tg = .false.
> +  ENDIF
> +  !
> +  IF( use_tg .and. .not. dfft%have_task_groups ) &
> +     CALL errore( ' tg_cft3s ', ' call requiring task groups for a descriptor without task groups ', 1 )
> +  !
> +  n1  = dfft%nr1
> +  n2  = dfft%nr2
> +  n3  = dfft%nr3
> +  nx1 = dfft%nr1x
> +  nx2 = dfft%nr2x
> +  nx3 = dfft%nr3x
> +  !
> +  me_p = dfft%mype + 1
> +  !
> +  IF ( isgn > 0 ) THEN
> +     !
> +     IF ( isgn /= 2 ) THEN
> +        !
> +        IF( use_tg ) &
> +           CALL errore( ' tg_cft3s ', ' task groups on large mesh not implemented ', 1 )
> +        !
> +     ENDIF
> +     !
> +     CALL fw_scatter( isgn ) ! forwart scatter from stick to planes
> +     !
> +  ELSE
> +     !
> +     IF ( isgn /= -2 ) THEN
> +        !
> +        IF( use_tg ) &
> +           CALL errore( ' tg_cft3s ', ' task groups on large mesh not implemented ', 1 )
> +        !
> +     ENDIF
> +     !
> +     CALL bw_scatter( isgn )
> +     !
> +  ENDIF
> +  !
> +  RETURN
> +  !
> +CONTAINS
> +  !
> +  SUBROUTINE fw_scatter( iopt )
> +
> +        !Transpose data for the 2-D FFT on the x-y plane
> +        !
> +        !NOGRP*dfft%nnr: The length of aux and f
> +        !nr3x: The length of each Z-stick
> +        !aux: input - output
> +        !f: working space
> +        !isgn: type of scatter
> +        !dfft%nsw(me) holds the number of Z-sticks proc. me has.
> +        !dfft%npp: number of planes per processor
> +        !
> +     !
> +     USE fft_base, ONLY: fft_scatter
> +     !
> +     INTEGER, INTENT(in) :: iopt
> +     !
> +     IF( iopt == 2 ) THEN
> +        !
> +        IF( use_tg ) THEN
> +           !
> +           CALL fft_scatter( dfft, aux, nx3, dfft%nogrp*dfft%tg_nnr, f, dfft%tg_nsw, dfft%tg_npp, iopt, use_tg )
> +           !
> +        ELSE
> +           !
> +           CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%nsw, dfft%npp, iopt )
> +           !
> +        ENDIF
> +        !
> +     ELSEIF( iopt == 1 ) THEN
> +        !
> +        CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%nsp, dfft%npp, iopt )
> +        !
> +     ENDIF
> +     !
> +     RETURN
> +  END SUBROUTINE fw_scatter
> +
> +  !
> +
> +  SUBROUTINE bw_scatter( iopt )
> +     !
> +     USE fft_base, ONLY: fft_scatter
> +     !
> +     INTEGER, INTENT(in) :: iopt
> +     !
> +     IF( iopt == -2 ) THEN
> +        !
> +        IF( use_tg ) THEN
> +           !
> +           CALL fft_scatter( dfft, aux, nx3, dfft%nogrp*dfft%tg_nnr, f, dfft%tg_nsw, dfft%tg_npp, iopt, use_tg )
> +           !
> +        ELSE
> +           !
> +           CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%nsw, dfft%npp, iopt )
> +           !
> +        ENDIF
> +        !
> +     ELSEIF( iopt == -1 ) THEN
> +        !
> +        CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%nsp, dfft%npp, iopt )
> +        !
> +     ENDIF
> +     !
> +     RETURN
> +  END SUBROUTINE bw_scatter
> +  !
> +END SUBROUTINE tg_cft3s_scatter
> +!
> +!----------------------------------------------------------------------------
> +SUBROUTINE tg_cft3s_xy( f, dfft, aux, isgn, use_task_groups )
> +  !----------------------------------------------------------------------------
> +  !
> +  USE fft_scalar, ONLY : cft_1z, cft_2xy
> +  USE fft_base,   ONLY : fft_scatter
> +  USE kinds,      ONLY : DP
> +  USE fft_types,  ONLY : fft_dlay_descriptor
> +  USE parallel_include
> +
> +  !
> +  IMPLICIT NONE
> +  !
> +  COMPLEX(DP), INTENT(inout)    :: f( : ), aux( : )  ! array containing data to be transformed
> +  TYPE (fft_dlay_descriptor), INTENT(in) :: dfft
> +                                           ! descriptor of fft data layout
> +  INTEGER, INTENT(in)           :: isgn    ! fft direction
> +  LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> +                                           ! specify if you want to use task groups parallelization
> +  !
> +  INTEGER                    :: me_p
> +  INTEGER                    :: n1, n2, n3, nx1, nx2, nx3
> +  COMPLEX(DP), ALLOCATABLE   :: yf(:)
> +  INTEGER                    :: planes( dfft%nr1x )
> +  LOGICAL                    :: use_tg
> +  !
> +  !
> +  IF( present( use_task_groups ) ) THEN
> +     use_tg = use_task_groups
> +  ELSE
> +     use_tg = .false.
> +  ENDIF
> +  !
> +  IF( use_tg .and. .not. dfft%have_task_groups ) &
> +     CALL errore( ' tg_cft3s ', ' call requiring task groups for a descriptor without task groups ', 1 )
> +  !
> +  n1  = dfft%nr1
> +  n2  = dfft%nr2
> +  n3  = dfft%nr3
> +  nx1 = dfft%nr1x
> +  nx2 = dfft%nr2x
> +  nx3 = dfft%nr3x
> +  !
> +  me_p = dfft%mype + 1
> +  !
> +  IF ( isgn > 0 ) THEN
> +     !
> +     IF ( isgn /= 2 ) THEN
> +        !
> +        IF( use_tg ) &
> +           CALL errore( ' tg_cft3s ', ' task groups on large mesh not implemented ', 1 )
> +        !
> +        planes = dfft%iplp
> +        !
> +     ELSE
> +        !
> +        planes = dfft%iplw
> +        !
> +     ENDIF
> +     !
> +     IF( use_tg ) THEN
> +        CALL cft_2xy( f, dfft%tg_npp( me_p ), n1, n2, nx1, nx2, isgn, planes )
> +     ELSE
> +        CALL cft_2xy( f, dfft%npp( me_p ), n1, n2, nx1, nx2, isgn, planes )
> +     ENDIF
> +     !
> +  ELSE
> +     !
> +     IF ( isgn /= -2 ) THEN
> +        !
> +        IF( use_tg ) &
> +           CALL errore( ' tg_cft3s ', ' task groups on large mesh not implemented ', 1 )
> +        !
> +        planes = dfft%iplp
> +        !
> +     ELSE
> +        !
> +        planes = dfft%iplw
> +        !
> +     ENDIF
> +
> +     IF( use_tg ) THEN
> +        CALL cft_2xy( f, dfft%tg_npp( me_p ), n1, n2, nx1, nx2, isgn, planes )
> +     ELSE
> +        CALL cft_2xy( f, dfft%npp( me_p ), n1, n2, nx1, nx2, isgn, planes)
> +     ENDIF
> +     !
> +  ENDIF
> +  !
> +  RETURN
> +  !
> +END SUBROUTINE tg_cft3s_xy
> +!
> +
> END MODULE fft_parallel
> 
> Modified: trunk/espresso/Modules/fft_scalar.f90
> ===================================================================
> --- trunk/espresso/Modules/fft_scalar.f90	2015-02-23 10:58:36 UTC (rev 11406)
> +++ trunk/espresso/Modules/fft_scalar.f90	2015-02-23 11:14:15 UTC (rev 11407)
> @@ -16,11 +16,20 @@
> !--------------------------------------------------------------------------!
> 
> #include "fft_defs.h"
> +
> +#if defined __DFTI
> +#include "mkl_dfti.f90"
> +#endif
> +
> !=----------------------------------------------------------------------=!
>    MODULE fft_scalar
> !=----------------------------------------------------------------------=!
>        USE kinds
> 
> +#if defined __DFTI
> +       USE MKL_DFTI ! -- this can be found int he MKL include directory
> +#endif
> +
>         IMPLICIT NONE
>         SAVE
> 
> @@ -77,6 +86,12 @@
> 
> #endif
> 
> +#if defined __DFTI
> +        TYPE dfti_descriptor_array
> +           TYPE(DFTI_DESCRIPTOR), POINTER :: desc
> +        END TYPE
> +#endif
> +
> !=----------------------------------------------------------------------=!
>    CONTAINS
> !=----------------------------------------------------------------------=!
> @@ -95,6 +110,10 @@
> 
>    SUBROUTINE cft_1z(c, nsl, nz, ldz, isign, cout)
> 
> +#if defined __DFTI
> +     USE iso_c_binding
> +#endif
> +
> !     driver routine for nsl 1d complex fft's of length nz
> !     ldz >= nz is the distance between sequences to be transformed
> !     (ldz>nz is used on some architectures to reduce memory conflicts)
> @@ -142,6 +161,21 @@
>      C_POINTER, SAVE :: fw_planz( ndims ) = 0
>      C_POINTER, SAVE :: bw_planz( ndims ) = 0
> 
> +#elif defined __DFTI
> +
> +     !   Intel MKL native FFT driver
> +
> +     TYPE(DFTI_DESCRIPTOR_ARRAY), SAVE :: hand( ndims )
> +     LOGICAL, SAVE :: dfti_first = .TRUE.
> +     INTEGER :: dfti_status = 0
> +     !
> +     IF( dfti_first .EQ. .TRUE. ) THEN
> +        DO ip = 1, ndims
> +           hand(ip)%desc => NULL()
> +        END DO
> +        dfti_first = .FALSE.
> +     END IF
> +
> #elif defined __ESSL || defined __LINUX_ESSL
> 
>      !   ESSL IBM library: see the ESSL manual for DCFT
> @@ -207,7 +241,6 @@
>        !   initialize a new one
> 
>        ! WRITE( stdout, fmt="('DEBUG cft_1z, reinitializing tables ', I3)" ) icurrent
> -
> #if defined __FFTW
> 
>        IF( fw_planz( icurrent) /= 0 ) CALL DESTROY_PLAN_1D( fw_planz( icurrent) )
> @@ -227,6 +260,53 @@
>        CALL dfftw_plan_many_dft( bw_planz( icurrent), 1, nz, nsl, c, &
>             (/SIZE(c)/), 1, ldz, cout, (/SIZE(cout)/), 1, ldz, idir, FFTW_ESTIMATE)
> 
> +#elif defined __DFTI
> +
> +       if( ASSOCIATED( hand( icurrent )%desc ) ) THEN
> +          dfti_status = DftiFreeDescriptor( hand( icurrent )%desc )
> +          IF( dfti_status /= 0) THEN
> +             WRITE(*,*) "stopped in DftiFreeDescriptor", dfti_status
> +             STOP
> +          ENDIF
> +       END IF
> +
> +     dfti_status = DftiCreateDescriptor(hand( icurrent )%desc, DFTI_DOUBLE, DFTI_COMPLEX, 1,nz)
> +     IF(dfti_status /= 0) THEN
> +        WRITE(*,*) "stopped in DftiCreateDescriptor", dfti_status
> +        STOP
> +     ENDIF
> +     dfti_status = DftiSetValue(hand( icurrent )%desc, DFTI_NUMBER_OF_TRANSFORMS,nsl)
> +     IF(dfti_status /= 0)THEN
> +        WRITE(*,*) "stopped in DFTI_NUMBER_OF_TRANSFORMS", dfti_status
> +        STOP
> +     ENDIF
> +     dfti_status = DftiSetValue(hand( icurrent )%desc,DFTI_INPUT_DISTANCE, ldz )
> +     IF(dfti_status /= 0)THEN
> +        WRITE(*,*) "stopped in DFTI_INPUT_DISTANCE", dfti_status
> +        STOP
> +     ENDIF
> +     dfti_status = DftiSetValue(hand( icurrent )%desc, DFTI_PLACEMENT, DFTI_INPLACE)
> +     IF(dfti_status /= 0)THEN
> +        WRITE(*,*) "stopped in DFTI_PLACEMENT", dfti_status
> +        STOP
> +     ENDIF
> +     tscale = 1.0_DP/nz
> +     dfti_status = DftiSetValue( hand( icurrent )%desc, DFTI_FORWARD_SCALE, tscale);
> +     IF(dfti_status /= 0)THEN
> +        WRITE(*,*) "stopped in DFTI_FORWARD_SCALE", dfti_status
> +        STOP
> +     ENDIF
> +     dfti_status = DftiSetValue( hand( icurrent )%desc, DFTI_BACKWARD_SCALE, DBLE(1) );
> +     IF(dfti_status /= 0)THEN
> +        WRITE(*,*) "stopped in DFTI_BACKWARD_SCALE", dfti_status
> +        STOP
> +     ENDIF
> +     dfti_status = DftiCommitDescriptor(hand( icurrent )%desc)
> +     IF(dfti_status /= 0)THEN
> +        WRITE(*,*) "stopped in DftiCommitDescriptor", dfti_status
> +        STOP
> +     ENDIF
> +
> #elif defined __ESSL || defined __LINUX_ESSL
> 
>        tscale = 1.0_DP / nz
> @@ -286,11 +366,9 @@
>           CALL FFT_Z_STICK_SINGLE(fw_planz( ip), c(offset), ldz_t)
>        END DO
> !$omp end do
> +!$omp end parallel
>        tscale = 1.0_DP / nz
> -!$omp workshare
>        cout( 1 : ldz * nsl ) = c( 1 : ldz * nsl ) * tscale
> -!$omp end workshare
> -!$omp end parallel
>      ELSE IF (isign > 0) THEN
> !$omp parallel default(none) private(tid,offset,i) shared(c,isign,nsl,bw_planz,ip,cout,ldz) &
> !$omp &        firstprivate(ldz_t)
> @@ -330,6 +408,24 @@
>         CALL dfftw_execute_dft( bw_planz( ip), c, cout)
>      END IF
> 
> +#elif defined __DFTI
> +
> +     IF (isign < 0) THEN
> +        dfti_status = DftiComputeForward(hand(ip)%desc, c )
> +        cout( 1 : ldz * nsl ) = c( 1 : ldz * nsl )
> +        IF(dfti_status /= 0) THEN
> +           WRITE(*,*) "stopped in DftiComputeForward", dfti_status
> +           STOP
> +        ENDIF
> +     ELSE IF (isign > 0) THEN
> +        dfti_status = DftiComputeBackward(hand(ip)%desc, c )
> +        cout( 1 : ldz * nsl ) = c( 1 : ldz * nsl )
> +        IF(dfti_status /= 0) THEN
> +           WRITE(*,*) "stopped in DftiComputeBackward", dfti_status
> +           STOP
> +        ENDIF
> +     END IF
> +
> #elif defined __SCSL
> 
>      IF ( isign < 0 ) THEN
> @@ -415,6 +511,10 @@
> 
>    SUBROUTINE cft_2xy(r, nzl, nx, ny, ldx, ldy, isign, pl2ix)
> 
> +#if defined __DFTI
> +     USE iso_c_binding
> +#endif
> +
> !     driver routine for nzl 2d complex fft's of lengths nx and ny
> !     input : r(ldx*ldy)  complex, transform is in-place
> !     ldx >= nx, ldy >= ny are the physical dimensions of the equivalent
> @@ -448,11 +548,23 @@
>      EXTERNAL :: omp_get_thread_num, omp_get_num_threads
> #endif
> 
> -#if defined __FFTW || defined __FFTW3
> +#if defined __DFTI
> 
> +     TYPE(DFTI_DESCRIPTOR_ARRAY), SAVE :: hand( ndims )
> +     LOGICAL, SAVE :: dfti_first = .TRUE.
> +     INTEGER :: dfti_status = 0
> +
> +#elif defined __FFTW || defined __FFTW3
> +
> +#if defined __FFTW && __FFTW_ALL_XY_PLANES
> +     C_POINTER, SAVE :: fw_plan_2d( ndims ) = 0
> +     C_POINTER, SAVE :: bw_plan_2d( ndims ) = 0
> +#else
>      C_POINTER, SAVE :: fw_plan( 2, ndims ) = 0
>      C_POINTER, SAVE :: bw_plan( 2, ndims ) = 0
> +#endif
> 
> +
> #elif defined __ESSL || defined __LINUX_ESSL
> 
>      INTEGER, PARAMETER :: ltabl = 20000 + 3 * nfftx
> @@ -503,6 +615,14 @@
>      !
>      !   Here initialize table only if necessary
>      !
> +#if defined __DFTI
> +     IF( dfti_first .EQ. .TRUE. ) THEN
> +        DO ip = 1, ndims
> +           hand(ip)%desc => NULL()
> +        END DO
> +        dfti_first = .FALSE.
> +     END IF
> +#endif
> 
>      DO ip = 1, ndims
> 
> @@ -524,8 +644,62 @@
> 
>        ! WRITE( stdout, fmt="('DEBUG cft_2xy, reinitializing tables ', I3)" ) icurrent
> 
> -#if defined __FFTW
> +#if defined __DFTI
> 
> +       if( ASSOCIATED( hand( icurrent )%desc ) ) THEN
> +          dfti_status = DftiFreeDescriptor( hand( icurrent )%desc )
> +          IF( dfti_status /= 0) THEN
> +             WRITE(*,*) "stopped in DftiFreeDescriptor", dfti_status
> +             STOP
> +          ENDIF
> +       END IF
> +
> +       dfti_status = DftiCreateDescriptor(hand( icurrent )%desc, DFTI_DOUBLE, DFTI_COMPLEX, 2,(/nx,ny/))
> +       IF(dfti_status /= 0) THEN
> +          WRITE(*,*) "stopped in DftiCreateDescriptor", dfti_status
> +          STOP
> +       ENDIF
> +       dfti_status = DftiSetValue(hand( icurrent )%desc, DFTI_NUMBER_OF_TRANSFORMS,nzl)
> +       IF(dfti_status /= 0)THEN
> +          WRITE(*,*) "stopped in DFTI_NUMBER_OF_TRANSFORMS", dfti_status
> +          STOP
> +       ENDIF
> +       dfti_status = DftiSetValue(hand( icurrent )%desc,DFTI_INPUT_DISTANCE, ldx*ldy )
> +       IF(dfti_status /= 0)THEN
> +          WRITE(*,*) "stopped in DFTI_INPUT_DISTANCE", dfti_status
> +          STOP
> +       ENDIF
> +       dfti_status = DftiSetValue(hand( icurrent )%desc, DFTI_PLACEMENT, DFTI_INPLACE)
> +       IF(dfti_status /= 0)THEN
> +          WRITE(*,*) "stopped in DFTI_PLACEMENT", dfti_status
> +          STOP
> +       ENDIF
> +       tscale = 1.0_DP/ (nx * ny )
> +       dfti_status = DftiSetValue( hand( icurrent )%desc, DFTI_FORWARD_SCALE, tscale);
> +       IF(dfti_status /= 0)THEN
> +          WRITE(*,*) "stopped in DFTI_FORWARD_SCALE", dfti_status
> +          STOP
> +       ENDIF
> +       dfti_status = DftiSetValue( hand( icurrent )%desc, DFTI_BACKWARD_SCALE, DBLE(1) );
> +       IF(dfti_status /= 0)THEN
> +          WRITE(*,*) "stopped in DFTI_BACKWARD_SCALE", dfti_status
> +          STOP
> +       ENDIF
> +       dfti_status = DftiCommitDescriptor(hand( icurrent )%desc)
> +       IF(dfti_status /= 0)THEN
> +          WRITE(*,*) "stopped in DftiCommitDescriptor", dfti_status
> +          STOP
> +       ENDIF
> +
> +
> +#elif defined __FFTW
> +
> +#if defined __FFTW_ALL_XY_PLANES
> +       IF( fw_plan_2d( icurrent) /= 0 )  CALL DESTROY_PLAN_2D(fw_plan_2d(icurrent) )
> +       IF( bw_plan_2d( icurrent) /= 0 )  CALL DESTROY_PLAN_2D(bw_plan_2d(icurrent) )
> +       idir = -1; CALL CREATE_PLAN_2D( fw_plan_2d(icurrent), nx, ny, idir)
> +       idir =  1; CALL CREATE_PLAN_2D( bw_plan_2d(icurrent), nx, ny, idir)
> +#else
>        IF( fw_plan( 2,icurrent) /= 0 )   CALL DESTROY_PLAN_1D( fw_plan( 2,icurrent) )
>        IF( bw_plan( 2,icurrent) /= 0 )   CALL DESTROY_PLAN_1D( bw_plan( 2,icurrent) )
>        idir = -1; CALL CREATE_PLAN_1D( fw_plan( 2,icurrent), ny, idir)
> @@ -535,6 +709,7 @@
>        IF( bw_plan( 1,icurrent) /= 0 ) CALL DESTROY_PLAN_1D( bw_plan( 1,icurrent) )
>        idir = -1; CALL CREATE_PLAN_1D( fw_plan( 1,icurrent), nx, idir)
>        idir =  1; CALL CREATE_PLAN_1D( bw_plan( 1,icurrent), nx, idir)
> +#endif
> 
> #elif defined __FFTW3
> 
> @@ -645,10 +820,46 @@
> #endif
> 
> 
> -#if defined __FFTW
> +#if defined __DFTI
> 
> -#if defined __OPENMP
> +     IF( isign < 0 ) THEN
> +        !
> +        dfti_status = DftiComputeForward(hand(ip)%desc, r(:))
> +        IF(dfti_status /= 0)THEN
> +           WRITE(*,*) "stopped in DftiComputeForward", dfti_status
> +           STOP
> +        ENDIF
> +        !
> +     ELSE IF( isign > 0 ) THEN
> +        !
> +        dfti_status = DftiComputeBackward(hand(ip)%desc, r(:))
> +        IF(dfti_status /= 0)THEN
> +           WRITE(*,*) "stopped in DftiComputeBackward", dfti_status
> +           STOP
> +        ENDIF
> +        !
> +     END IF
> 
> +
> +#elif defined __FFTW
> +
> +#if defined __FFTW_ALL_XY_PLANES
> +
> +     IF( isign < 0 ) THEN
> +        !
> +        tscale = 1.0_DP / ( nx * ny )
> +        !
> +        CALL fftw_inplace_drv_2d( fw_plan_2d(ip), nzl, r(1), 1, ldx*ldy )
> +        CALL ZDSCAL( ldx * ldy * nzl, tscale, r(1), 1)
> +        !
> +     ELSE IF( isign > 0 ) THEN
> +        !
> +        CALL fftw_inplace_drv_2d( bw_plan_2d(ip), nzl, r(1), 1, ldx*ldy )
> +        !
> +     END IF
> +
> +#elif defined __OPENMP
> +
>      nx_t  = nx
>      ny_t  = ny
>      nzl_t = nzl
> @@ -1980,6 +2191,7 @@
>   !
>   if (mod (n, 2) ==0) nx = n + 1
>   ! for nec vector machines: if n is even increase dimension by 1
> +  !
> #endif
>   !
>   good_fft_dimension = nx
> 
> Modified: trunk/espresso/Modules/wavefunctions.f90
> ===================================================================
> --- trunk/espresso/Modules/wavefunctions.f90	2015-02-23 10:58:36 UTC (rev 11406)
> +++ trunk/espresso/Modules/wavefunctions.f90	2015-02-23 11:14:15 UTC (rev 11407)
> @@ -28,6 +28,7 @@
>      ! electronic wave functions, CPV code
>      ! distributed over gvector and bands
>      !
> +!dir$ attributes align: 4096 :: c0_bgrp, cm_bgrp, phi_bgrp
>      COMPLEX(DP), ALLOCATABLE :: c0_bgrp(:,:)  ! wave functions at time t
>      COMPLEX(DP), ALLOCATABLE :: cm_bgrp(:,:)  ! wave functions at time t-delta t
>      COMPLEX(DP), ALLOCATABLE :: phi_bgrp(:,:) ! |phi> = s'|c0> = |c0> + sum q_ij |i><j|c0>
> 
> _______________________________________________
> Q-e-commits mailing list
> Q-e-commits at qe-forge.org
> http://qe-forge.org/mailman/listinfo/q-e-commits

--
Mr. Filippo SPIGA, M.Sc.
http://filippospiga.info ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert

*****
Disclaimer: "Please note this message and any attachments are CONFIDENTIAL and may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission."






More information about the developers mailing list