[Pw_forum] Gathering 3-d arrays across pools using QE's mp_sum

Ye Luo xw111luoye at gmail.com
Mon Oct 31 02:03:43 CET 2016


Is kunit=1 ?
Is your nkf and nkf_tot the same as nks nks_tot provided by QE environment?
Could you print the nbase for each processor and see if they are the
numbers expected?

Ye

===================
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-30 18:59 GMT-05:00 Vahid Askarpour <vh261281 at dal.ca>:

> I allocate an array that needs to be calculated among pools using:
>
> ALLOCATE(A(3,nbnd,nkf)), where nkf is the number of k points per pool and
> initialized as:
> A(:,:,:)=0
>
> Once array A is calculated, I allocate array A_tot as:
>
> ALLOCATE(A_tot(3,nbnd,nkf_tot)), where nkf_tot is the total number of k
> points. and initialized as:
> A_tot(:,:,:)=0
>
> Then I call the poolgather1 as below:
>
> #if defined(__MPI)
> CALL poolgather1(3,nbnd,nkf_tot,nkf,A,A_tot)
> #else
> A_tot=A
> #endif
>
> I have tried both allocating A_tot outside and inside the #if defined
> section.
>
> Poolgather1 is similar to other EPW poolgather and QE poolcollect routines
> and is given below:
>
> subroutine poolgather1 (nsize1,nsize2,nkstot,nks, f_in, f_out)
>   !--------------------------------------------------------------------
>   USE kinds,     ONLY : DP
>   USE mp_global, ONLY : my_pool_id, inter_pool_comm, npool,kunit
>   USE mp,        ONLY : mp_barrier, mp_bcast,mp_sum
>   USE mp_world,  ONLY : mpime
>   implicit none
>   !
>   INTEGER, INTENT (in) :: nsize1
>   !! first dimension of vectors f_in and f_out
>   INTEGER, INTENT (in) :: nsize2
>   !! second dimension of vectors f_in and f_out
>   INTEGER, INTENT (in) :: nks
>   !! number of k-points per pool
>   INTEGER, INTENT (in) :: nkstot
>   !! total number of k-points
>   REAL (KIND=DP), INTENT (in) :: f_in(nsize1,nsize2,nks)
>   ! input ( only for k-points of mypool )
>   REAL (KIND=DP), INTENT (out)  :: f_out(nsize1,nsize2,nkstot)
>   ! output  ( contains values for all k-point )
>   !
> #if defined(__MPI)
>   INTEGER :: rest, nbase
>   ! the rest of the integer division nkstot / npool
>   ! the position in the original list
>   rest = nkstot / kunit - ( nkstot / kunit / npool ) * npool
>   !
>   nbase = nks * my_pool_id
>   !
>   IF ( ( my_pool_id + 1 ) > rest ) nbase = nbase + rest * kunit
>   !
>   f_out = 0.d0
>   f_out(1:nsize1,1:nsize2,(nbase+1):(nbase+nks)) =
> f_in(1:nsize1,1:nsize2,1:nks)
>   !
>   ! ... reduce across the pools
>   !
>   CALL mp_sum(f_out,inter_pool_comm)
>   !
> #else
>   f_out(:,:,:) = f_in(:,:,:)
>   !
> #endif
>   !
>   end subroutine poolgather1
>
> For my test case, nsize1=3, nsize2=4, nkstot=47.
>
> Thanks,
>
> Vahid
>
>
>
>
> On Oct 30, 2016, at 8:19 PM, Ye Luo <xw111luoye at gmail.com> wrote:
>
> mp_sum_rt is interfaced with mp_sum for 3d array. When you mp_sum a 3d
> array, it will be automatically called.
> USE mp,   ONLY : mp_sum_rt is not necessary.
> If you want to check if mp_sum_rt is implicitly called via interface, you
> can print something before and after your mp_sum call and also print
> something in mp_sum_rt. You should find the expected printing behaviour.
>
> if you compare the mp_sum_rt with the 2d and 4d implementation mp_sum_rm
> and mp_sum_r4d, you should find they flat the dimensionality and call the
> same 1-d routine reduce_base_real.
>
> As I asked earlier, how did you allocate your 3-D arrays input and output?
> Are you allocating sufficient size for the output, did you deallocate it by
> mistake? How large is output array size, if it exceeds 2^32, you might hit
> the 32bit integer bug of the variable msglen.
>
> Ye
>
> ===================
> Ye Luo, Ph.D.
> Leadership Computing Facility
> Argonne National Laboratory
>
> 2016-10-30 14:25 GMT-05:00 Vahid Askarpour <vh261281 at dal.ca>:
>
>> Hi Ye,
>>
>> The changes I am making are part of the EPW/QE code and so the
>> parallelization depends on the QE MPI routines. I have no problem using
>> mp_sum for a 2-D array such as
>> energy(iband,ik). I can use mp_sum to add the contributions from various
>> nodes and print the total in agreement with the output from the serial run.
>> As far as the allocation goes, I use the same 2-D approach to my 3-D array.
>>
>> The problem arises when I use the 3-D array with mp_sum. I noticed that
>> in the Modules/mp.f90 file, there is a subroutine as follows:
>>
>>       SUBROUTINE mp_sum_rt( msg, gid )
>>         IMPLICIT NONE
>>         REAL (DP), INTENT (INOUT) :: msg(:,:,:)
>>         INTEGER, INTENT(IN) :: gid
>> #if defined(__MPI)
>>         INTEGER :: msglen
>>         msglen = size(msg)
>>         CALL reduce_base_real( msglen, msg, gid, -1 )
>> #endif
>>       END SUBROUTINE mp_sum_rt
>>
>>
>> Here  msg has three dimensions. There are other subroutines for 1, 2, and
>> 4 dimensions. Somehow, if I can get  EPW to use the above subroutine, it
>> might work. Adding a statement like:
>>
>> USE mp,   ONLY : mp_sum_rt
>>
>> at the beginning of the file does not work as the compilation fails
>> because EPW does not see the subroutine. Interestingly, it does see mp_sum
>> which is in the same mp.f90 file.
>>
>> So is there a way that I can get EPW to see mp_sum_rt? This might solve
>> my problem.
>>
>> Since I know that mp_sum works with 2-D arrays, one option is to rewrite
>> my files in 2-D, one array for each x,y,z direction. I am trying to avoid
>> this.
>>
>> Thanks,
>>
>> Vahid
>>
>>
>> On Oct 30, 2016, at 3:59 PM, Ye Luo <xw111luoye at gmail.com> wrote:
>>
>> Hi Vahid,
>> segfault in mp_sum doesn't necessarily mean a problem there. Probably you
>> wrote something to output array but not in a valid place before mp_sum.
>> Try to check your allocation of output and the copy make sure they are
>> correct.
>>
>> Ye
>>
>>
>> ===================
>> Ye Luo, Ph.D.
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> 2016-10-28 14:33 GMT-05:00 Vahid Askarpour <vh261281 at dal.ca>:
>>
>>> Hi Ye,
>>>
>>> Thank you for your suggestion. I tried it and when I ran the code, it
>>> seg-faulted. I put flags in the code to see where the segmentation faults
>>> occurs. It happens as the code calls mp_sum. It seems that mp_sum may not
>>> be able to handle this reduction.
>>>
>>> Cheers,
>>> Vahid
>>>
>>> On Oct 28, 2016, at 2:51 PM, Ye Luo <xw111luoye at gmail.com> wrote:
>>>
>>> In Fortran, whatever-D array is 1-D array. mp_sum should be fine.
>>> I saw something strange in your code that you were not copying the right
>>> things as you expected.
>>> How about the following?
>>> output(1:3,1:nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))
>>> =input(1:3,1:nbnds,1:k_pool)
>>>
>>> Ye
>>>
>>> ===================
>>> Ye Luo, Ph.D.
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>> 2016-10-28 12:29 GMT-05:00 Vahid Askarpour <vh261281 at dal.ca>:
>>>
>>>> Dear QE Users,
>>>>
>>>> I am working on some modifications to the QE-6.0 code using symmetry.
>>>> When I try to combine a 3-D array scattered across nodes, I use the
>>>> following:
>>>>
>>>> output(3,nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=inp
>>>> ut(3,nbnds,1:k_pool)
>>>>
>>>> Here, nbnds is the number of bands, k_pool is the number of k
>>>> points/pool, and pool_id is the id of the pool. Here I am assuming the the
>>>> number of k points is divisible by the number of pools.
>>>>
>>>> Then I call mp_sum(output,inter_pool_comm) to put all the segments of
>>>> input across the nodes  into one output file.
>>>>
>>>> When I run the modified QE code in parallel, the output file is
>>>> different from the serial run.
>>>>
>>>> Does the QE's mp_sum allow the above operation for a three-D array?
>>>>
>>>> Any hints or suggestions would be greatly appreciated.
>>>>
>>>> Vahid
>>>>
>>>> Vahid Askarpour
>>>> Department of Physics and Atmospheric Science
>>>> Dalhousie University,
>>>> Halifax, NS, Canada
>>>> _______________________________________________
>>>> Pw_forum mailing list
>>>> Pw_forum at pwscf.org
>>>> http://pwscf.org/mailman/listinfo/pw_forum
>>>>
>>>
>>> _______________________________________________
>>> Pw_forum mailing list
>>> Pw_forum at pwscf.org
>>> http://pwscf.org/mailman/listinfo/pw_forum
>>>
>>>
>>>
>>> _______________________________________________
>>> Pw_forum mailing list
>>> Pw_forum at pwscf.org
>>> http://pwscf.org/mailman/listinfo/pw_forum
>>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>>
>>
>> _______________________________________________
>> Pw_forum mailing list
>> Pw_forum at pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20161030/8847aaa6/attachment.html>


More information about the users mailing list