[Q-e-developers] [Q-e-commits] r11407 - trunk/espresso/Modules
Paolo Giannozzi
paolo.giannozzi at uniud.it
Tue Feb 24 16:01:28 CET 2015
On Tue, 2015-02-24 at 14:45 +0000, Filippo Spiga wrote:
> Then Intel compiler has a pre-processor macros that contains the
> compiler version called "__ICC". Just set it to be major of 1310 (so
> Intel 14.x minimum is required to use this compiler directive)
>
>
> #if defined(__ICC)
> #if __ICC > 1310
> !dir$ attributes align: 4096 :: yf, aux
> #endif
> #endif
this seems to work (that is: my compiler doesn't crash any longer)
(it seems it has __ICC not defined, though).
P.
> It should fix the problem. I can apply and test it later today if you
> are ok. Carlo, opinions?
>
>
> F
>
>
> On Feb 24, 2015, at 6:42 AM, Paolo Giannozzi
> <paolo.giannozzi at uniud.it> wrote:
> > It's line 76:
> > !dir$ attributes align: 4096 :: yf, aux
> > that makes my compiler (v.12) cras.
> >
> > P.
> > On Mon, 2015-02-23 at 22:12 +0000, Filippo Spiga wrote:
> > > Ciao Carlo,
> > >
> > > I spot that for a old (how old Paolo G?) version of Intel compiler
> > > the fft_parallel.f90 files with your latest changes generate a
> > > internal compiler error. It happens with and without MPI support.
> > >
> > > See:
> > > - http://xiexie.syslab.disco.unimib.it:8010/builders/UDINE%20%
> > > 28HYBRID%29
> > > - http://xiexie.syslab.disco.unimib.it:8010/builders/UDINE%20%
> > > 28SERIAL%29/builds/4
> > > - http://xiexie.syslab.disco.unimib.it:8010/builders/UDINE%20%
> > > 28SERIAL%29/builds/4/steps/make%20all/logs/stdio
> > >
> > > Did you test these changes using which version of Intel?
> > >
> > > Cheers,
> > > Filippo
> > >
> > > Begin forwarded message:
> > > > Date: February 23, 2015 at 11:14:16 AM GMT
> > > > From: ccavazzoni <ccavazzoni at qeforge.qe-forge.org>
> > > > To: q-e-commits at qe-forge.org
> > > > Subject: [Q-e-commits] r11407 - trunk/espresso/Modules
> > > > Reply-To: Quantum ESPRESSO svn commit messages
> > > > <q-e-commits at qe-forge.org>
> > > >
> > > > Author: ccavazzoni
> > > > Date: 2015-02-23 12:14:15 +0100 (Mon, 23 Feb 2015)
> > > > New Revision: 11407
> > > >
> > > > Modified:
> > > > trunk/espresso/Modules/fft_parallel.f90
> > > > trunk/espresso/Modules/fft_scalar.f90
> > > > trunk/espresso/Modules/wavefunctions.f90
> > > > Log:
> > > > - adding memory alignment directives to have a performance
> > > > improvement on Intel architecure (CPU+Network),
> > > > only meaningful for intel compiler, shold be of no arm for all
> > > > the other
> > > > - Intel DFTI MKL fft interface back in again (with __DFTI) in
> > > > fft_scalar, some issues with the fftw3 interface
> > > > is prevending porting to Xeon PHI processor (no arm to all other
> > > > procs/fft)
> > > > - A split in thress different subs of the general driver
> > > > (tg_cft3s) is added to the fft_parallel module,
> > > > to support software pipelining optimizations, to mask
> > > > communication and data transfer latency.
> > > >
> > > >
> > > > Modified: trunk/espresso/Modules/fft_parallel.f90
> > > > ===================================================================
> > > > --- trunk/espresso/Modules/fft_parallel.f90 2015-02-23 10:58:36
> > > > UTC (rev 11406)
> > > > +++ trunk/espresso/Modules/fft_parallel.f90 2015-02-23 11:14:15
> > > > UTC (rev 11407)
> > > > @@ -73,6 +73,7 @@
> > > > LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> > > > ! specify if you want
> > > > to use task groups parallelization
> > > > !
> > > > +!dir$ attributes align: 4096 :: yf, aux
> > > > INTEGER :: me_p
> > > > INTEGER :: n1, n2, n3, nx1, nx2, nx3
> > > > COMPLEX(DP), ALLOCATABLE :: yf(:), aux (:)
> > > > @@ -329,4 +330,418 @@
> > > > !
> > > > END SUBROUTINE tg_cft3s
> > > > !
> > > > +!
> > > > +!
> > > > +!----------------------------------------------------------------------------
> > > > +SUBROUTINE tg_cft3s_z( f, dfft, aux, isgn, use_task_groups )
> > > > +
> > > > !----------------------------------------------------------------------------
> > > > + !
> > > > + USE fft_scalar, ONLY : cft_1z, cft_2xy
> > > > + USE fft_base, ONLY : fft_scatter
> > > > + USE kinds, ONLY : DP
> > > > + USE fft_types, ONLY : fft_dlay_descriptor
> > > > + USE parallel_include
> > > > +
> > > > + !
> > > > + IMPLICIT NONE
> > > > + !
> > > > + COMPLEX(DP), INTENT(inout) :: f( : ) ! array containing
> > > > data to be transformed
> > > > + COMPLEX(DP), INTENT(inout) :: aux (:)
> > > > + TYPE (fft_dlay_descriptor), INTENT(in) :: dfft
> > > > + ! descriptor of fft
> > > > data layout
> > > > + INTEGER, INTENT(in) :: isgn ! fft direction
> > > > + LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> > > > + ! specify if you
> > > > want to use task groups parallelization
> > > > + !
> > > > + INTEGER :: me_p
> > > > + INTEGER :: n1, n2, n3, nx1, nx2, nx3
> > > > + COMPLEX(DP), ALLOCATABLE :: yf(:)
> > > > + INTEGER :: planes( dfft%nr1x )
> > > > + LOGICAL :: use_tg
> > > > + !
> > > > + !
> > > > + IF( present( use_task_groups ) ) THEN
> > > > + use_tg = use_task_groups
> > > > + ELSE
> > > > + use_tg = .false.
> > > > + ENDIF
> > > > + !
> > > > + IF( use_tg .and. .not. dfft%have_task_groups ) &
> > > > + CALL errore( ' tg_cft3s_x ', ' call requiring task groups
> > > > for a descriptor without task groups ', 1 )
> > > > + !
> > > > + n1 = dfft%nr1
> > > > + n2 = dfft%nr2
> > > > + n3 = dfft%nr3
> > > > + nx1 = dfft%nr1x
> > > > + nx2 = dfft%nr2x
> > > > + nx3 = dfft%nr3x
> > > > + !
> > > > + IF( use_tg ) THEN
> > > > + ALLOCATE( YF ( dfft%nogrp * dfft%tg_nnr ) )
> > > > + ENDIF
> > > > + !
> > > > + me_p = dfft%mype + 1
> > > > + !
> > > > + IF ( isgn > 0 ) THEN
> > > > + !
> > > > + IF ( isgn /= 2 ) THEN
> > > > + !
> > > > + IF( use_tg ) &
> > > > + CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > + !
> > > > + CALL cft_1z( f, dfft%nsp( me_p ), n3, nx3, isgn, aux )
> > > > + !
> > > > + ELSE
> > > > + !
> > > > + CALL pack_group_sticks()
> > > > + !
> > > > + IF( use_tg ) THEN
> > > > + CALL cft_1z( yf, dfft%tg_nsw( me_p ), n3, nx3, isgn,
> > > > aux )
> > > > + ELSE
> > > > + CALL cft_1z( f, dfft%nsw( me_p ), n3, nx3, isgn, aux
> > > > )
> > > > + ENDIF
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + ELSE
> > > > + !
> > > > + IF ( isgn /= -2 ) THEN
> > > > + !
> > > > + IF( use_tg ) &
> > > > + CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > + !
> > > > + ENDIF
> > > > +
> > > > + IF ( isgn /= -2 ) THEN
> > > > + !
> > > > + CALL cft_1z( aux, dfft%nsp( me_p ), n3, nx3, isgn, f )
> > > > + !
> > > > + ELSE
> > > > + !
> > > > + IF( use_tg ) THEN
> > > > + CALL cft_1z( aux, dfft%tg_nsw( me_p ), n3, nx3,
> > > > isgn, yf )
> > > > + ELSE
> > > > + CALL cft_1z( aux, dfft%nsw( me_p ), n3, nx3, isgn, f
> > > > )
> > > > + ENDIF
> > > > + !
> > > > + CALL unpack_group_sticks()
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + IF( use_tg ) THEN
> > > > + DEALLOCATE( yf )
> > > > + ENDIF
> > > > + !
> > > > + RETURN
> > > > + !
> > > > +CONTAINS
> > > > + !
> > > > + SUBROUTINE pack_group_sticks()
> > > > +
> > > > + INTEGER :: ierr
> > > > + !
> > > > + IF( .not. use_tg ) RETURN
> > > > + !
> > > > + IF( dfft%tg_rdsp(dfft%nogrp) + dfft%tg_rcv(dfft%nogrp) >
> > > > size( yf ) ) THEN
> > > > + CALL errore( 'pack_group_sticks' , ' inconsistent size
> > > > ', 1 )
> > > > + ENDIF
> > > > + IF( dfft%tg_psdsp(dfft%nogrp) + dfft%tg_snd(dfft%nogrp) >
> > > > size( f ) ) THEN
> > > > + CALL errore( 'pack_group_sticks', ' inconsistent size
> > > > ', 2 )
> > > > + ENDIF
> > > > +
> > > > + CALL start_clock( 'ALLTOALL' )
> > > > + !
> > > > + ! Collect all the sticks of the different states,
> > > > + ! in "yf" processors will have all the sticks of the OGRP
> > > > +
> > > > +#if defined __MPI
> > > > +
> > > > + CALL MPI_ALLTOALLV( f(1), dfft%tg_snd, dfft%tg_psdsp,
> > > > MPI_DOUBLE_COMPLEX, yf(1), dfft%tg_rcv, &
> > > > + & dfft%tg_rdsp, MPI_DOUBLE_COMPLEX,
> > > > dfft%ogrp_comm, IERR)
> > > > + IF( ierr /= 0 ) THEN
> > > > + CALL errore( 'pack_group_sticks', ' alltoall error 1 ',
> > > > abs(ierr) )
> > > > + ENDIF
> > > > +
> > > > +#endif
> > > > +
> > > > + CALL stop_clock( 'ALLTOALL' )
> > > > + !
> > > > + !YF Contains all ( ~ NOGRP*dfft%nsw(me) ) Z-sticks
> > > > + !
> > > > + RETURN
> > > > + END SUBROUTINE pack_group_sticks
> > > > +
> > > > + !
> > > > +
> > > > + SUBROUTINE unpack_group_sticks()
> > > > + !
> > > > + ! Bring pencils back to their original distribution
> > > > + !
> > > > + INTEGER :: ierr
> > > > + !
> > > > + IF( .not. use_tg ) RETURN
> > > > + !
> > > > + IF( dfft%tg_usdsp(dfft%nogrp) + dfft%tg_snd(dfft%nogrp) >
> > > > size( f ) ) THEN
> > > > + CALL errore( 'unpack_group_sticks', ' inconsistent size
> > > > ', 3 )
> > > > + ENDIF
> > > > + IF( dfft%tg_rdsp(dfft%nogrp) + dfft%tg_rcv(dfft%nogrp) >
> > > > size( yf ) ) THEN
> > > > + CALL errore( 'unpack_group_sticks', ' inconsistent size
> > > > ', 4 )
> > > > + ENDIF
> > > > +
> > > > + CALL start_clock( 'ALLTOALL' )
> > > > +
> > > > +#if defined __MPI
> > > > + CALL MPI_Alltoallv( yf(1), &
> > > > + dfft%tg_rcv, dfft%tg_rdsp, MPI_DOUBLE_COMPLEX, f(1),
> > > > &
> > > > + dfft%tg_snd, dfft%tg_usdsp, MPI_DOUBLE_COMPLEX, dfft%
> > > > ogrp_comm, IERR)
> > > > + IF( ierr /= 0 ) THEN
> > > > + CALL errore( 'unpack_group_sticks', ' alltoall error 2
> > > > ', abs(ierr) )
> > > > + ENDIF
> > > > +#endif
> > > > +
> > > > + CALL stop_clock( 'ALLTOALL' )
> > > > +
> > > > + RETURN
> > > > + END SUBROUTINE unpack_group_sticks
> > > > + !
> > > > +END SUBROUTINE tg_cft3s_z
> > > > +!
> > > > +!
> > > > +!----------------------------------------------------------------------------
> > > > +SUBROUTINE tg_cft3s_scatter( f, dfft, aux, isgn,
> > > > use_task_groups )
> > > > +
> > > > !----------------------------------------------------------------------------
> > > > + !
> > > > + USE fft_scalar, ONLY : cft_1z, cft_2xy
> > > > + USE fft_base, ONLY : fft_scatter
> > > > + USE kinds, ONLY : DP
> > > > + USE fft_types, ONLY : fft_dlay_descriptor
> > > > + USE parallel_include
> > > > +
> > > > + !
> > > > + IMPLICIT NONE
> > > > + !
> > > > + COMPLEX(DP), INTENT(inout) :: f( : ), aux( : ) ! array
> > > > containing data to be transformed
> > > > + TYPE (fft_dlay_descriptor), INTENT(in) :: dfft
> > > > + ! descriptor of fft
> > > > data layout
> > > > + INTEGER, INTENT(in) :: isgn ! fft direction
> > > > + LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> > > > + ! specify if you
> > > > want to use task groups parallelization
> > > > + !
> > > > + INTEGER :: me_p
> > > > + INTEGER :: n1, n2, n3, nx1, nx2, nx3
> > > > + INTEGER :: planes( dfft%nr1x )
> > > > + LOGICAL :: use_tg
> > > > + !
> > > > + !
> > > > + IF( present( use_task_groups ) ) THEN
> > > > + use_tg = use_task_groups
> > > > + ELSE
> > > > + use_tg = .false.
> > > > + ENDIF
> > > > + !
> > > > + IF( use_tg .and. .not. dfft%have_task_groups ) &
> > > > + CALL errore( ' tg_cft3s ', ' call requiring task groups
> > > > for a descriptor without task groups ', 1 )
> > > > + !
> > > > + n1 = dfft%nr1
> > > > + n2 = dfft%nr2
> > > > + n3 = dfft%nr3
> > > > + nx1 = dfft%nr1x
> > > > + nx2 = dfft%nr2x
> > > > + nx3 = dfft%nr3x
> > > > + !
> > > > + me_p = dfft%mype + 1
> > > > + !
> > > > + IF ( isgn > 0 ) THEN
> > > > + !
> > > > + IF ( isgn /= 2 ) THEN
> > > > + !
> > > > + IF( use_tg ) &
> > > > + CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + CALL fw_scatter( isgn ) ! forwart scatter from stick to
> > > > planes
> > > > + !
> > > > + ELSE
> > > > + !
> > > > + IF ( isgn /= -2 ) THEN
> > > > + !
> > > > + IF( use_tg ) &
> > > > + CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + CALL bw_scatter( isgn )
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + RETURN
> > > > + !
> > > > +CONTAINS
> > > > + !
> > > > + SUBROUTINE fw_scatter( iopt )
> > > > +
> > > > + !Transpose data for the 2-D FFT on the x-y plane
> > > > + !
> > > > + !NOGRP*dfft%nnr: The length of aux and f
> > > > + !nr3x: The length of each Z-stick
> > > > + !aux: input - output
> > > > + !f: working space
> > > > + !isgn: type of scatter
> > > > + !dfft%nsw(me) holds the number of Z-sticks proc. me
> > > > has.
> > > > + !dfft%npp: number of planes per processor
> > > > + !
> > > > + !
> > > > + USE fft_base, ONLY: fft_scatter
> > > > + !
> > > > + INTEGER, INTENT(in) :: iopt
> > > > + !
> > > > + IF( iopt == 2 ) THEN
> > > > + !
> > > > + IF( use_tg ) THEN
> > > > + !
> > > > + CALL fft_scatter( dfft, aux, nx3, dfft%nogrp*dfft%
> > > > tg_nnr, f, dfft%tg_nsw, dfft%tg_npp, iopt, use_tg )
> > > > + !
> > > > + ELSE
> > > > + !
> > > > + CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%
> > > > nsw, dfft%npp, iopt )
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + ELSEIF( iopt == 1 ) THEN
> > > > + !
> > > > + CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%
> > > > nsp, dfft%npp, iopt )
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + RETURN
> > > > + END SUBROUTINE fw_scatter
> > > > +
> > > > + !
> > > > +
> > > > + SUBROUTINE bw_scatter( iopt )
> > > > + !
> > > > + USE fft_base, ONLY: fft_scatter
> > > > + !
> > > > + INTEGER, INTENT(in) :: iopt
> > > > + !
> > > > + IF( iopt == -2 ) THEN
> > > > + !
> > > > + IF( use_tg ) THEN
> > > > + !
> > > > + CALL fft_scatter( dfft, aux, nx3, dfft%nogrp*dfft%
> > > > tg_nnr, f, dfft%tg_nsw, dfft%tg_npp, iopt, use_tg )
> > > > + !
> > > > + ELSE
> > > > + !
> > > > + CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%
> > > > nsw, dfft%npp, iopt )
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + ELSEIF( iopt == -1 ) THEN
> > > > + !
> > > > + CALL fft_scatter( dfft, aux, nx3, dfft%nnr, f, dfft%
> > > > nsp, dfft%npp, iopt )
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + RETURN
> > > > + END SUBROUTINE bw_scatter
> > > > + !
> > > > +END SUBROUTINE tg_cft3s_scatter
> > > > +!
> > > > +!----------------------------------------------------------------------------
> > > > +SUBROUTINE tg_cft3s_xy( f, dfft, aux, isgn, use_task_groups )
> > > > +
> > > > !----------------------------------------------------------------------------
> > > > + !
> > > > + USE fft_scalar, ONLY : cft_1z, cft_2xy
> > > > + USE fft_base, ONLY : fft_scatter
> > > > + USE kinds, ONLY : DP
> > > > + USE fft_types, ONLY : fft_dlay_descriptor
> > > > + USE parallel_include
> > > > +
> > > > + !
> > > > + IMPLICIT NONE
> > > > + !
> > > > + COMPLEX(DP), INTENT(inout) :: f( : ), aux( : ) ! array
> > > > containing data to be transformed
> > > > + TYPE (fft_dlay_descriptor), INTENT(in) :: dfft
> > > > + ! descriptor of fft
> > > > data layout
> > > > + INTEGER, INTENT(in) :: isgn ! fft direction
> > > > + LOGICAL, OPTIONAL, INTENT(in) :: use_task_groups
> > > > + ! specify if you
> > > > want to use task groups parallelization
> > > > + !
> > > > + INTEGER :: me_p
> > > > + INTEGER :: n1, n2, n3, nx1, nx2, nx3
> > > > + COMPLEX(DP), ALLOCATABLE :: yf(:)
> > > > + INTEGER :: planes( dfft%nr1x )
> > > > + LOGICAL :: use_tg
> > > > + !
> > > > + !
> > > > + IF( present( use_task_groups ) ) THEN
> > > > + use_tg = use_task_groups
> > > > + ELSE
> > > > + use_tg = .false.
> > > > + ENDIF
> > > > + !
> > > > + IF( use_tg .and. .not. dfft%have_task_groups ) &
> > > > + CALL errore( ' tg_cft3s ', ' call requiring task groups
> > > > for a descriptor without task groups ', 1 )
> > > > + !
> > > > + n1 = dfft%nr1
> > > > + n2 = dfft%nr2
> > > > + n3 = dfft%nr3
> > > > + nx1 = dfft%nr1x
> > > > + nx2 = dfft%nr2x
> > > > + nx3 = dfft%nr3x
> > > > + !
> > > > + me_p = dfft%mype + 1
> > > > + !
> > > > + IF ( isgn > 0 ) THEN
> > > > + !
> > > > + IF ( isgn /= 2 ) THEN
> > > > + !
> > > > + IF( use_tg ) &
> > > > + CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > + !
> > > > + planes = dfft%iplp
> > > > + !
> > > > + ELSE
> > > > + !
> > > > + planes = dfft%iplw
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + IF( use_tg ) THEN
> > > > + CALL cft_2xy( f, dfft%tg_npp( me_p ), n1, n2, nx1, nx2,
> > > > isgn, planes )
> > > > + ELSE
> > > > + CALL cft_2xy( f, dfft%npp( me_p ), n1, n2, nx1, nx2,
> > > > isgn, planes )
> > > > + ENDIF
> > > > + !
> > > > + ELSE
> > > > + !
> > > > + IF ( isgn /= -2 ) THEN
> > > > + !
> > > > + IF( use_tg ) &
> > > > + CALL errore( ' tg_cft3s ', ' task groups on large
> > > > mesh not implemented ', 1 )
> > > > + !
> > > > + planes = dfft%iplp
> > > > + !
> > > > + ELSE
> > > > + !
> > > > + planes = dfft%iplw
> > > > + !
> > > > + ENDIF
> > > > +
> > > > + IF( use_tg ) THEN
> > > > + CALL cft_2xy( f, dfft%tg_npp( me_p ), n1, n2, nx1, nx2,
> > > > isgn, planes )
> > > > + ELSE
> > > > + CALL cft_2xy( f, dfft%npp( me_p ), n1, n2, nx1, nx2,
> > > > isgn, planes)
> > > > + ENDIF
> > > > + !
> > > > + ENDIF
> > > > + !
> > > > + RETURN
> > > > + !
> > > > +END SUBROUTINE tg_cft3s_xy
> > > > +!
> > > > +
> > > > END MODULE fft_parallel
> > > >
> > > > Modified: trunk/espresso/Modules/fft_scalar.f90
> > > > ===================================================================
> > > > --- trunk/espresso/Modules/fft_scalar.f90 2015-02-23 10:58:36
> > > > UTC (rev 11406)
> > > > +++ trunk/espresso/Modules/fft_scalar.f90 2015-02-23 11:14:15
> > > > UTC (rev 11407)
> > > > @@ -16,11 +16,20 @@
> > > > !--------------------------------------------------------------------------!
> > > >
> > > > #include "fft_defs.h"
> > > > +
> > > > +#if defined __DFTI
> > > > +#include "mkl_dfti.f90"
> > > > +#endif
> > > > +
> > > > !
> > > > =----------------------------------------------------------------------=!
> > > > MODULE fft_scalar
> > > > !
> > > > =----------------------------------------------------------------------=!
> > > > USE kinds
> > > >
> > > > +#if defined __DFTI
> > > > + USE MKL_DFTI ! -- this can be found int he MKL include
> > > > directory
> > > > +#endif
> > > > +
> > > > IMPLICIT NONE
> > > > SAVE
> > > >
> > > > @@ -77,6 +86,12 @@
> > > >
> > > > #endif
> > > >
> > > > +#if defined __DFTI
> > > > + TYPE dfti_descriptor_array
> > > > + TYPE(DFTI_DESCRIPTOR), POINTER :: desc
> > > > + END TYPE
> > > > +#endif
> > > > +
> > > > !
> > > > =----------------------------------------------------------------------=!
> > > > CONTAINS
> > > > !
> > > > =----------------------------------------------------------------------=!
> > > > @@ -95,6 +110,10 @@
> > > >
> > > > SUBROUTINE cft_1z(c, nsl, nz, ldz, isign, cout)
> > > >
> > > > +#if defined __DFTI
> > > > + USE iso_c_binding
> > > > +#endif
> > > > +
> > > > ! driver routine for nsl 1d complex fft's of length nz
> > > > ! ldz >= nz is the distance between sequences to be
> > > > transformed
> > > > ! (ldz>nz is used on some architectures to reduce memory
> > > > conflicts)
> > > > @@ -142,6 +161,21 @@
> > > > C_POINTER, SAVE :: fw_planz( ndims ) = 0
> > > > C_POINTER, SAVE :: bw_planz( ndims ) = 0
> > > >
> > > > +#elif defined __DFTI
> > > > +
> > > > + ! Intel MKL native FFT driver
> > > > +
> > > > + TYPE(DFTI_DESCRIPTOR_ARRAY), SAVE :: hand( ndims )
> > > > + LOGICAL, SAVE :: dfti_first = .TRUE.
> > > > + INTEGER :: dfti_status = 0
> > > > + !
> > > > + IF( dfti_first .EQ. .TRUE. ) THEN
> > > > + DO ip = 1, ndims
> > > > + hand(ip)%desc => NULL()
> > > > + END DO
> > > > + dfti_first = .FALSE.
> > > > + END IF
> > > > +
> > > > #elif defined __ESSL || defined __LINUX_ESSL
> > > >
> > > > ! ESSL IBM library: see the ESSL manual for DCFT
> > > > @@ -207,7 +241,6 @@
> > > > ! initialize a new one
> > > >
> > > > ! WRITE( stdout, fmt="('DEBUG cft_1z, reinitializing
> > > > tables ', I3)" ) icurrent
> > > > -
> > > > #if defined __FFTW
> > > >
> > > > IF( fw_planz( icurrent) /= 0 ) CALL
> > > > DESTROY_PLAN_1D( fw_planz( icurrent) )
> > > > @@ -227,6 +260,53 @@
> > > > CALL dfftw_plan_many_dft( bw_planz( icurrent), 1, nz, nsl,
> > > > c, &
> > > > (/SIZE(c)/), 1, ldz, cout, (/SIZE(cout)/), 1, ldz,
> > > > idir, FFTW_ESTIMATE)
> > > >
> > > > +#elif defined __DFTI
> > > > +
> > > > + if( ASSOCIATED( hand( icurrent )%desc ) ) THEN
> > > > + dfti_status = DftiFreeDescriptor( hand( icurrent )%
> > > > desc )
> > > > + IF( dfti_status /= 0) THEN
> > > > + WRITE(*,*) "stopped in DftiFreeDescriptor",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + END IF
> > > > +
> > > > + dfti_status = DftiCreateDescriptor(hand( icurrent )%desc,
> > > > DFTI_DOUBLE, DFTI_COMPLEX, 1,nz)
> > > > + IF(dfti_status /= 0) THEN
> > > > + WRITE(*,*) "stopped in DftiCreateDescriptor",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiSetValue(hand( icurrent )%desc,
> > > > DFTI_NUMBER_OF_TRANSFORMS,nsl)
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_NUMBER_OF_TRANSFORMS",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiSetValue(hand( icurrent )%
> > > > desc,DFTI_INPUT_DISTANCE, ldz )
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_INPUT_DISTANCE",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiSetValue(hand( icurrent )%desc,
> > > > DFTI_PLACEMENT, DFTI_INPLACE)
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_PLACEMENT", dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + tscale = 1.0_DP/nz
> > > > + dfti_status = DftiSetValue( hand( icurrent )%desc,
> > > > DFTI_FORWARD_SCALE, tscale);
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_FORWARD_SCALE", dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiSetValue( hand( icurrent )%desc,
> > > > DFTI_BACKWARD_SCALE, DBLE(1) );
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_BACKWARD_SCALE",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiCommitDescriptor(hand( icurrent )%desc)
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DftiCommitDescriptor",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > +
> > > > #elif defined __ESSL || defined __LINUX_ESSL
> > > >
> > > > tscale = 1.0_DP / nz
> > > > @@ -286,11 +366,9 @@
> > > > CALL FFT_Z_STICK_SINGLE(fw_planz( ip), c(offset),
> > > > ldz_t)
> > > > END DO
> > > > !$omp end do
> > > > +!$omp end parallel
> > > > tscale = 1.0_DP / nz
> > > > -!$omp workshare
> > > > cout( 1 : ldz * nsl ) = c( 1 : ldz * nsl ) * tscale
> > > > -!$omp end workshare
> > > > -!$omp end parallel
> > > > ELSE IF (isign > 0) THEN
> > > > !$omp parallel default(none) private(tid,offset,i)
> > > > shared(c,isign,nsl,bw_planz,ip,cout,ldz) &
> > > > !$omp & firstprivate(ldz_t)
> > > > @@ -330,6 +408,24 @@
> > > > CALL dfftw_execute_dft( bw_planz( ip), c, cout)
> > > > END IF
> > > >
> > > > +#elif defined __DFTI
> > > > +
> > > > + IF (isign < 0) THEN
> > > > + dfti_status = DftiComputeForward(hand(ip)%desc, c )
> > > > + cout( 1 : ldz * nsl ) = c( 1 : ldz * nsl )
> > > > + IF(dfti_status /= 0) THEN
> > > > + WRITE(*,*) "stopped in DftiComputeForward",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + ELSE IF (isign > 0) THEN
> > > > + dfti_status = DftiComputeBackward(hand(ip)%desc, c )
> > > > + cout( 1 : ldz * nsl ) = c( 1 : ldz * nsl )
> > > > + IF(dfti_status /= 0) THEN
> > > > + WRITE(*,*) "stopped in DftiComputeBackward",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + END IF
> > > > +
> > > > #elif defined __SCSL
> > > >
> > > > IF ( isign < 0 ) THEN
> > > > @@ -415,6 +511,10 @@
> > > >
> > > > SUBROUTINE cft_2xy(r, nzl, nx, ny, ldx, ldy, isign, pl2ix)
> > > >
> > > > +#if defined __DFTI
> > > > + USE iso_c_binding
> > > > +#endif
> > > > +
> > > > ! driver routine for nzl 2d complex fft's of lengths nx and
> > > > ny
> > > > ! input : r(ldx*ldy) complex, transform is in-place
> > > > ! ldx >= nx, ldy >= ny are the physical dimensions of the
> > > > equivalent
> > > > @@ -448,11 +548,23 @@
> > > > EXTERNAL :: omp_get_thread_num, omp_get_num_threads
> > > > #endif
> > > >
> > > > -#if defined __FFTW || defined __FFTW3
> > > > +#if defined __DFTI
> > > >
> > > > + TYPE(DFTI_DESCRIPTOR_ARRAY), SAVE :: hand( ndims )
> > > > + LOGICAL, SAVE :: dfti_first = .TRUE.
> > > > + INTEGER :: dfti_status = 0
> > > > +
> > > > +#elif defined __FFTW || defined __FFTW3
> > > > +
> > > > +#if defined __FFTW && __FFTW_ALL_XY_PLANES
> > > > + C_POINTER, SAVE :: fw_plan_2d( ndims ) = 0
> > > > + C_POINTER, SAVE :: bw_plan_2d( ndims ) = 0
> > > > +#else
> > > > C_POINTER, SAVE :: fw_plan( 2, ndims ) = 0
> > > > C_POINTER, SAVE :: bw_plan( 2, ndims ) = 0
> > > > +#endif
> > > >
> > > > +
> > > > #elif defined __ESSL || defined __LINUX_ESSL
> > > >
> > > > INTEGER, PARAMETER :: ltabl = 20000 + 3 * nfftx
> > > > @@ -503,6 +615,14 @@
> > > > !
> > > > ! Here initialize table only if necessary
> > > > !
> > > > +#if defined __DFTI
> > > > + IF( dfti_first .EQ. .TRUE. ) THEN
> > > > + DO ip = 1, ndims
> > > > + hand(ip)%desc => NULL()
> > > > + END DO
> > > > + dfti_first = .FALSE.
> > > > + END IF
> > > > +#endif
> > > >
> > > > DO ip = 1, ndims
> > > >
> > > > @@ -524,8 +644,62 @@
> > > >
> > > > ! WRITE( stdout, fmt="('DEBUG cft_2xy, reinitializing
> > > > tables ', I3)" ) icurrent
> > > >
> > > > -#if defined __FFTW
> > > > +#if defined __DFTI
> > > >
> > > > + if( ASSOCIATED( hand( icurrent )%desc ) ) THEN
> > > > + dfti_status = DftiFreeDescriptor( hand( icurrent )%
> > > > desc )
> > > > + IF( dfti_status /= 0) THEN
> > > > + WRITE(*,*) "stopped in DftiFreeDescriptor",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + END IF
> > > > +
> > > > + dfti_status = DftiCreateDescriptor(hand( icurrent )%
> > > > desc, DFTI_DOUBLE, DFTI_COMPLEX, 2,(/nx,ny/))
> > > > + IF(dfti_status /= 0) THEN
> > > > + WRITE(*,*) "stopped in DftiCreateDescriptor",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiSetValue(hand( icurrent )%desc,
> > > > DFTI_NUMBER_OF_TRANSFORMS,nzl)
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_NUMBER_OF_TRANSFORMS",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiSetValue(hand( icurrent )%
> > > > desc,DFTI_INPUT_DISTANCE, ldx*ldy )
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_INPUT_DISTANCE",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiSetValue(hand( icurrent )%desc,
> > > > DFTI_PLACEMENT, DFTI_INPLACE)
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_PLACEMENT", dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + tscale = 1.0_DP/ (nx * ny )
> > > > + dfti_status = DftiSetValue( hand( icurrent )%desc,
> > > > DFTI_FORWARD_SCALE, tscale);
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_FORWARD_SCALE",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiSetValue( hand( icurrent )%desc,
> > > > DFTI_BACKWARD_SCALE, DBLE(1) );
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DFTI_BACKWARD_SCALE",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + dfti_status = DftiCommitDescriptor(hand( icurrent )%
> > > > desc)
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DftiCommitDescriptor",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > +
> > > > +
> > > > +#elif defined __FFTW
> > > > +
> > > > +#if defined __FFTW_ALL_XY_PLANES
> > > > + IF( fw_plan_2d( icurrent) /= 0 ) CALL
> > > > DESTROY_PLAN_2D(fw_plan_2d(icurrent) )
> > > > + IF( bw_plan_2d( icurrent) /= 0 ) CALL
> > > > DESTROY_PLAN_2D(bw_plan_2d(icurrent) )
> > > > + idir = -1; CALL CREATE_PLAN_2D( fw_plan_2d(icurrent),
> > > > nx, ny, idir)
> > > > + idir = 1; CALL CREATE_PLAN_2D( bw_plan_2d(icurrent),
> > > > nx, ny, idir)
> > > > +#else
> > > > IF( fw_plan( 2,icurrent) /= 0 ) CALL
> > > > DESTROY_PLAN_1D( fw_plan( 2,icurrent) )
> > > > IF( bw_plan( 2,icurrent) /= 0 ) CALL
> > > > DESTROY_PLAN_1D( bw_plan( 2,icurrent) )
> > > > idir = -1; CALL CREATE_PLAN_1D( fw_plan( 2,icurrent), ny,
> > > > idir)
> > > > @@ -535,6 +709,7 @@
> > > > IF( bw_plan( 1,icurrent) /= 0 ) CALL
> > > > DESTROY_PLAN_1D( bw_plan( 1,icurrent) )
> > > > idir = -1; CALL CREATE_PLAN_1D( fw_plan( 1,icurrent), nx,
> > > > idir)
> > > > idir = 1; CALL CREATE_PLAN_1D( bw_plan( 1,icurrent), nx,
> > > > idir)
> > > > +#endif
> > > >
> > > > #elif defined __FFTW3
> > > >
> > > > @@ -645,10 +820,46 @@
> > > > #endif
> > > >
> > > >
> > > > -#if defined __FFTW
> > > > +#if defined __DFTI
> > > >
> > > > -#if defined __OPENMP
> > > > + IF( isign < 0 ) THEN
> > > > + !
> > > > + dfti_status = DftiComputeForward(hand(ip)%desc, r(:))
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DftiComputeForward",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + !
> > > > + ELSE IF( isign > 0 ) THEN
> > > > + !
> > > > + dfti_status = DftiComputeBackward(hand(ip)%desc, r(:))
> > > > + IF(dfti_status /= 0)THEN
> > > > + WRITE(*,*) "stopped in DftiComputeBackward",
> > > > dfti_status
> > > > + STOP
> > > > + ENDIF
> > > > + !
> > > > + END IF
> > > >
> > > > +
> > > > +#elif defined __FFTW
> > > > +
> > > > +#if defined __FFTW_ALL_XY_PLANES
> > > > +
> > > > + IF( isign < 0 ) THEN
> > > > + !
> > > > + tscale = 1.0_DP / ( nx * ny )
> > > > + !
> > > > + CALL fftw_inplace_drv_2d( fw_plan_2d(ip), nzl, r(1), 1,
> > > > ldx*ldy )
> > > > + CALL ZDSCAL( ldx * ldy * nzl, tscale, r(1), 1)
> > > > + !
> > > > + ELSE IF( isign > 0 ) THEN
> > > > + !
> > > > + CALL fftw_inplace_drv_2d( bw_plan_2d(ip), nzl, r(1), 1,
> > > > ldx*ldy )
> > > > + !
> > > > + END IF
> > > > +
> > > > +#elif defined __OPENMP
> > > > +
> > > > nx_t = nx
> > > > ny_t = ny
> > > > nzl_t = nzl
> > > > @@ -1980,6 +2191,7 @@
> > > > !
> > > > if (mod (n, 2) ==0) nx = n + 1
> > > > ! for nec vector machines: if n is even increase dimension by 1
> > > > + !
> > > > #endif
> > > > !
> > > > good_fft_dimension = nx
> > > >
> > > > Modified: trunk/espresso/Modules/wavefunctions.f90
> > > > ===================================================================
> > > > --- trunk/espresso/Modules/wavefunctions.f90 2015-02-23 10:58:36
> > > > UTC (rev 11406)
> > > > +++ trunk/espresso/Modules/wavefunctions.f90 2015-02-23 11:14:15
> > > > UTC (rev 11407)
> > > > @@ -28,6 +28,7 @@
> > > > ! electronic wave functions, CPV code
> > > > ! distributed over gvector and bands
> > > > !
> > > > +!dir$ attributes align: 4096 :: c0_bgrp, cm_bgrp, phi_bgrp
> > > > COMPLEX(DP), ALLOCATABLE :: c0_bgrp(:,:) ! wave functions
> > > > at time t
> > > > COMPLEX(DP), ALLOCATABLE :: cm_bgrp(:,:) ! wave functions
> > > > at time t-delta t
> > > > COMPLEX(DP), ALLOCATABLE :: phi_bgrp(:,:) ! |phi> = s'|c0> =
> > > > |c0> + sum q_ij |i><j|c0>
> > > >
> > > > _______________________________________________
> > > > Q-e-commits mailing list
> > > > Q-e-commits at qe-forge.org
> > > > http://qe-forge.org/mailman/listinfo/q-e-commits
> > >
> > > --
> > > Mr. Filippo SPIGA, M.Sc.
> > > http://filippospiga.info ~ skype: filippo.spiga
> > >
> > > «Nobody will drive us out of Cantor's paradise.» ~ David Hilbert
> > >
> > > *****
> > > Disclaimer: "Please note this message and any attachments are
> > > CONFIDENTIAL and may be privileged or otherwise protected from
> > > disclosure. The contents are not to be disclosed to anyone other
> > > than the addressee. Unauthorized recipients are requested to
> > > preserve this confidentiality and to advise the sender immediately
> > > of any error in transmission."
> > >
> > >
> >
> >
>
> --
> Mr. Filippo SPIGA, M.Sc.
> http://filippospiga.info ~ skype: filippo.spiga
>
> «Nobody will drive us out of Cantor's paradise.» ~ David Hilbert
>
> *****
> Disclaimer: "Please note this message and any attachments are
> CONFIDENTIAL and may be privileged or otherwise protected from
> disclosure. The contents are not to be disclosed to anyone other than
> the addressee. Unauthorized recipients are requested to preserve this
> confidentiality and to advise the sender immediately of any error in
> transmission."
>
>
>
>
--
Paolo Giannozzi, Dept. Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
More information about the developers
mailing list