[QE-users] Error during diagonalization (memcpy, zhegvdx_gpu) in nscf with many bands (GPU )

Sara Postorino sara.pst at gmail.com
Sat Aug 29 18:33:50 CEST 2020


Hi QE users,

I am running PW on Marconi100 and experiencing problems during
digonalization. I am using version 6.5 (autoload of the modules on m100).
My system is a MoTe2 bilayer k mesh 39x39x1 with many bands due to the fact
that I will do a GW calculation on top of it. (The calculation works if I
do not add many bands)
I tried with 4000 and 3000 bands using Davidson diagonalization running on
18 nodes:
Parallel version (MPI & OpenMP), running on    2304 processor cores
     Number of MPI processes:                72
     Threads/MPI process:                    32
When doin the calculation of the first point I get:

 Really copied g2kin H->D
 Really copied evc H->D
 Really copied et H->D
 Really copied vrs H->D
 dp_memcpy_d2h_c2dinvalid pitch argument           12

I also tried with Conjugate gradient algorithm but  it gets stuck at

 Really copied evc H->D
 Really copied et H->D
 Really copied h_diag H->D
 Really copied becp%nc H->D
 Really copied g2kin H->D
 Really copied vrs H->D

And here it takes forever. I left it running for more than 1 hour and it
didn't finish on k point and since I have 147 kpoints the computation
would be very expensive even if it worked.

I also tried to go down to 1000 bands (I need way more) and got
 Really copied g2kin H->D
 Really copied evc H->D
 Really copied et H->D
 Really copied vrs H->D
 zhegvdx_gpu error: cusolverDnZpotrf failed!

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine  cdiaghg_gpu (1):
      zhegvdx_gpu failed
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Do you have any suggestion on how to fix this issue?
Thanks

Sara Postorino
PhD student
University of Rome Tor Vergata


<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Mail
priva di virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200829/f9d3f044/attachment.html>


More information about the users mailing list