[QE-users] Error during diagonalization (memcpy, zhegvdx_gpu) in nscf with many bands (GPU )
Sara Postorino
sara.pst at gmail.com
Sat Aug 29 18:33:50 CEST 2020
Hi QE users,
I am running PW on Marconi100 and experiencing problems during
digonalization. I am using version 6.5 (autoload of the modules on m100).
My system is a MoTe2 bilayer k mesh 39x39x1 with many bands due to the fact
that I will do a GW calculation on top of it. (The calculation works if I
do not add many bands)
I tried with 4000 and 3000 bands using Davidson diagonalization running on
18 nodes:
Parallel version (MPI & OpenMP), running on 2304 processor cores
Number of MPI processes: 72
Threads/MPI process: 32
When doin the calculation of the first point I get:
Really copied g2kin H->D
Really copied evc H->D
Really copied et H->D
Really copied vrs H->D
dp_memcpy_d2h_c2dinvalid pitch argument 12
I also tried with Conjugate gradient algorithm but it gets stuck at
Really copied evc H->D
Really copied et H->D
Really copied h_diag H->D
Really copied becp%nc H->D
Really copied g2kin H->D
Really copied vrs H->D
And here it takes forever. I left it running for more than 1 hour and it
didn't finish on k point and since I have 147 kpoints the computation
would be very expensive even if it worked.
I also tried to go down to 1000 bands (I need way more) and got
Really copied g2kin H->D
Really copied evc H->D
Really copied et H->D
Really copied vrs H->D
zhegvdx_gpu error: cusolverDnZpotrf failed!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Error in routine cdiaghg_gpu (1):
zhegvdx_gpu failed
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Do you have any suggestion on how to fix this issue?
Thanks
Sara Postorino
PhD student
University of Rome Tor Vergata
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Mail
priva di virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200829/f9d3f044/attachment.html>
More information about the users
mailing list