[QE-users] Error during diagonalization (memcpy, zhegvdx_gpu) in nscf with many bands (GPU )
Pietro Bonfa
pietro.bonfa at unipr.it
Sun Aug 30 23:17:46 CEST 2020
Dear Sara,
I'd suggest checking the following:
1. verify that the serial eigensolver is used (it's written at the
beginning of the output);
2. use the latest version (6.6a1) that will correctly report problems
with memory allocations during the iterative diagonalization.
Could you please also open an issue at
https://gitlab.com/QEF/q-e-gpu/-/issues and attach the input, the
pseudopotentials and the job script that you are using?
Thank you,
kind regards,
Pietro
On 8/29/20 6:33 PM, Sara Postorino wrote:
> Hi QE users,
>
> I am running PW on Marconi100 and experiencing problems during
> digonalization. I am using version 6.5 (autoload of the modules on m100).
> My system is a MoTe2 bilayer k mesh 39x39x1 with many bands due to the
> fact that I will do a GW calculation on top of it. (The calculation
> works if I do not add many bands)
> I tried with 4000 and 3000 bands using Davidson diagonalization running
> on 18 nodes:
> Parallel version (MPI & OpenMP), running on 2304 processor cores
> Number of MPI processes: 72
> Threads/MPI process: 32
> When doin the calculation of the first point I get:
>
> Really copied g2kin H->D
> Really copied evc H->D
> Really copied et H->D
> Really copied vrs H->D
> dp_memcpy_d2h_c2dinvalid pitch argument 12
>
> I also tried with Conjugate gradient algorithm but it gets stuck at
>
> Really copied evc H->D
> Really copied et H->D
> Really copied h_diag H->D
> Really copied becp%nc H->D
> Really copied g2kin H->D
> Really copied vrs H->D
>
> And here it takes forever. I left it running for more than 1 hour and it
> didn't finish on k point and since I have 147 kpoints the computation
> would be very expensive even if it worked.
>
> I also tried to go down to 1000 bands (I need way more) and got
> Really copied g2kin H->D
> Really copied evc H->D
> Really copied et H->D
> Really copied vrs H->D
> zhegvdx_gpu error: cusolverDnZpotrf failed!
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> Error in routine cdiaghg_gpu (1):
> zhegvdx_gpu failed
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> Do you have any suggestion on how to fix this issue?
> Thanks
>
> Sara Postorino
> PhD student
> University of Rome Tor Vergata
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Mail priva di virus. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>
>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (http://www.max-centre.eu/quantum-espresso
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
Firma il tuo 5 per mille all’Università di Parma e aiuta così i nostri studenti che vogliono realizzare un’esperienza di studio all’estero - Indica 00308780345 nella tua denuncia dei redditi.
More information about the users
mailing list