[QE-users] CUDA errors
Paolo Giannozzi
paolo.giannozzi at uniud.it
Wed Nov 16 22:46:37 CET 2022
With 4 GPUs I would try 4 MPI ranks, npool=4 (and maybe 4 OpenMP
threads). The second error message means that you are out of memory
(memory is only partially distributed with pools, that is, k-point
parallelization). The first one, no idea
PAolo
On 11/11/2022 16:32, Christopher Moltisanti wrote:
>
> You don't often get email from crist.moltisanti at gmail.com. Learn why
> this is important <https://aka.ms/LearnAboutSenderIdentification>
>
>
> Hello QE users,
>
> I am quite new to QE and I have been experimenting with different
> runtime configurations on GPUs. In particular, I am running the TA2O5
> input (26 k-points). Depending on the number of MPI ranks and/or npool
> values I am getting different runtime errors. To name a few, the most
> frequent ones I get are:
>
> 1) *Configuration: *16 MPI ranks, 4MPI ranks/node, 4GPUs/node, npool=1
> *Error: *
> Error in routine fft_scatter_many_columns_to_planes_store (1):
> cudaMemcpy2DAsync failed
>
> 2)*Configuration:* 16 MPI ranks, 4MPI ranks/node, 4GPUs/node, npool=16
> * Error:*
> Dense grid: 3645397 G-vectors FFT dimensions: ( 200, 180, 216)
> 0: ALLOCATE: 11202625536 bytes requested; status = 2(out of memory)
>
> Could you please help me to understand what is going on?
>
> Regards
> Chris
>
> _______________________________________________
> The Quantum ESPRESSO community stands by the Ukrainian
> people and expresses its concerns about the devastating
> effects that the Russian military offensive has on their
> country and on the free and peaceful scientific, cultural,
> and economic cooperation amongst peoples
> _______________________________________________
> Quantum ESPRESSO is supported by MaX (www.max-centre.eu)
> users mailing list users at lists.quantum-espresso.org
> https://lists.quantum-espresso.org/mailman/listinfo/users
--
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine Italy, +39-0432-558216
More information about the users
mailing list