[QE-users] Muti-GPU Error on v.6.7MaX
Pietro Bonfa'
pietro.bonfa at unipr.it
Mon Mar 22 11:50:28 CET 2021
Dear Jatin,
it's very hard to tell what the problem is without additional details.
Can you share your input?
Can you try running without pool parallelism (to reduce the memory
footprint)?
Since you _may_ be hitting a code-related problem, you can also consider
opening a confidential issue on gitlab if you do not want do disclose
some details.
Best,
Pietro
On 3/22/21 5:24 AM, Jatin Kashyap wrote:
> Dear QE Community Members,
>
> I am trying to run Program PWSCF v.6.7MaX on the XSEDE Comet cluster
> with the given configuration[1]
> But the code is exiting with an error[2].
>
> Can anybody please help to find out how to fix it if it is not a
> machine-error?
>
> Thank you.
>
> [1]
> #SBATCH --nodes=1
> #SBATCH --ntasks-per-node=2
> #SBATCH --mem=51G
> #SBATCH --gres=gpu:p100:2
>
> [2]
> iteration # 1 ecut= 40.00 Ry beta= 0.70
> Warning: ieee_inexact is signaling
> 1
> Davidson diagonalization with overlap
> zhegvdx_gpu error: cusolverDnZpotrf failed!
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> Error in routine cdiaghg_gpu (1):
> zhegvdx_gpu failed
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> stopping ...
>
>
>
> ——
> Jatin Kashyap
> Ph.D. Student
> Dr. Dibakar Datta Group
> Department of Mechanical and Industrial Engineering
> New Jersey Institute of Technology (NJIT)
> University Heights
> Newark, NJ 07102-1982
> Phone- (201)889-5783
> Email- jk435 at njit.edu <mailto:jk435 at njit.edu>
>
>
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
>
More information about the users
mailing list