[QE-users] Muti-GPU Error on v.6.7MaX

Pietro Bonfa' pietro.bonfa at unipr.it
Mon Mar 22 11:50:28 CET 2021


Dear Jatin,

it's very hard to tell what the problem is without additional details.

Can you share your input?
Can you try running without pool parallelism (to reduce the memory 
footprint)?

Since you _may_ be hitting a code-related problem, you can also consider 
opening a confidential issue on gitlab if you do not want do disclose 
some details.

Best,
Pietro



On 3/22/21 5:24 AM, Jatin Kashyap wrote:
> Dear QE Community Members,
> 
> I am trying to run  Program PWSCF v.6.7MaX on the XSEDE Comet cluster 
> with the given configuration[1]
> But the code is exiting with an error[2].
> 
> Can anybody please help to find out how to fix it if it is not a 
> machine-error?
> 
> Thank you.
> 
> [1]
> #SBATCH --nodes=1
> #SBATCH --ntasks-per-node=2
> #SBATCH --mem=51G
> #SBATCH --gres=gpu:p100:2
> 
> [2]
>   iteration #  1     ecut=    40.00 Ry     beta= 0.70
> Warning: ieee_inexact is signaling
>      1
>       Davidson diagonalization with overlap
>   zhegvdx_gpu error: cusolverDnZpotrf failed!
> 
>   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>       Error in routine  cdiaghg_gpu (1):
>        zhegvdx_gpu failed
>   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
>       stopping ...
> 
> 
> 
> ——
> Jatin Kashyap
> Ph.D. Student
> Dr. Dibakar Datta Group
> Department of Mechanical and Industrial Engineering
> New Jersey Institute of Technology (NJIT)
> University Heights
> Newark, NJ 07102-1982
> Phone- (201)889-5783
> Email- jk435 at njit.edu <mailto:jk435 at njit.edu>
> 
> 
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
> 



More information about the users mailing list