[QE-users] Error during diagonalization (memcpy, zhegvdx_gpu) in nscf with many bands (GPU )

Sara Postorino sara.pst at gmail.com
Mon Aug 31 18:54:39 CEST 2020


Thank for your response,

I ran it again with 6.5 (couldn't install 6.6a1), it uses the serial
eigensolver.

now I get :
     Band Structure Calculation
     Davidson diagonalization with overlap

     Computing kpt #:     1  of     9 on this pool
 Really copied g2kin H->D
 Really copied evc H->D
 Really copied et H->D

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine  cegterg (1):
      cannot allocate vc_d
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...

I attach input and output

I'll put the rest on gitlab

Thank you,
Sara


Il giorno dom 30 ago 2020 alle ore 23:18 Pietro Bonfa <pietro.bonfa at unipr.it>
ha scritto:

> Dear Sara,
>
> I'd suggest checking the following:
>
> 1. verify that the serial eigensolver is used (it's written at the
> beginning of the output);
>
> 2. use the latest version (6.6a1) that will correctly report problems
> with memory allocations during the iterative diagonalization.
>
> Could you please also open an issue at
> https://gitlab.com/QEF/q-e-gpu/-/issues and attach the input, the
> pseudopotentials and the job script that you are using?
>
> Thank you,
> kind regards,
> Pietro
>
>
>
> On 8/29/20 6:33 PM, Sara Postorino wrote:
> > Hi QE users,
> >
> > I am running PW on Marconi100 and experiencing problems during
> > digonalization. I am using version 6.5 (autoload of the modules on m100).
> > My system is a MoTe2 bilayer k mesh 39x39x1 with many bands due to the
> > fact that I will do a GW calculation on top of it. (The calculation
> > works if I do not add many bands)
> > I tried with 4000 and 3000 bands using Davidson diagonalization running
> > on 18 nodes:
> > Parallel version (MPI & OpenMP), running on    2304 processor cores
> >       Number of MPI processes:                72
> >       Threads/MPI process:                    32
> > When doin the calculation of the first point I get:
> >
> >   Really copied g2kin H->D
> >   Really copied evc H->D
> >   Really copied et H->D
> >   Really copied vrs H->D
> >   dp_memcpy_d2h_c2dinvalid pitch argument           12
> >
> > I also tried with Conjugate gradient algorithm but  it gets stuck at
> >
> >   Really copied evc H->D
> >   Really copied et H->D
> >   Really copied h_diag H->D
> >   Really copied becp%nc H->D
> >   Really copied g2kin H->D
> >   Really copied vrs H->D
> >
> > And here it takes forever. I left it running for more than 1 hour and it
> > didn't finish on k point and since I have 147 kpoints the computation
> > would be very expensive even if it worked.
> >
> > I also tried to go down to 1000 bands (I need way more) and got
> >   Really copied g2kin H->D
> >   Really copied evc H->D
> >   Really copied et H->D
> >   Really copied vrs H->D
> >   zhegvdx_gpu error: cusolverDnZpotrf failed!
> >
> >
>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >       Error in routine  cdiaghg_gpu (1):
> >        zhegvdx_gpu failed
> >
>  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >
> > Do you have any suggestion on how to fix this issue?
> > Thanks
> >
> > Sara Postorino
> > PhD student
> > University of Rome Tor Vergata
> >
> >
> > <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> >       Mail priva di virus. www.avast.com
> > <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> >
> >
> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
> > _______________________________________________
> > Quantum ESPRESSO is supported by MaX (
> http://www.max-centre.eu/quantum-espresso
> > users mailing list users at lists.quantum-espresso.org
> > https://lists.quantum-espresso.org/mailman/listinfo/users
> >
>
> Firma il tuo 5 per mille all’Università di Parma e aiuta così i nostri
> studenti che vogliono realizzare un’esperienza di studio all’estero -
> Indica 00308780345 nella tua denuncia dei redditi.
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu/quantum-espresso)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users


<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Mail
priva di virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#m_-4887640929092430203_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200831/b87c77bf/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MoTe2_2L_39_nscf.out
Type: application/octet-stream
Size: 37772 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200831/b87c77bf/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MoTe2_2L_39_nscf.in
Type: application/octet-stream
Size: 850 bytes
Desc: not available
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20200831/b87c77bf/attachment-0001.obj>


More information about the users mailing list