[QE-users] QE GPU docker at NVIDIA NGC memory hang
Louis Stuber
lstuber at nvidia.com
Tue Apr 27 17:16:06 CEST 2021
Hi!
I chatted with Karim but thought it would be great sharing with others as well.
Regarding the memory error: unfortunately Ausurf is not that small of a system, you need at least 32 GB or 2x16 GB to run it. (Karim had only 1x16 GB)
To further diagnose OOM errors you can use the estimated memory consumption in QE’s output:
Estimated max dynamical RAM per process > 20.83 GB
Regarding the NGC container, we use it a lot internally and honestly I think it’s a very useful tool. Please if you’re open to try it but are encountering issues, do not hesitate to ping me, I am happy to help.
In short, install docker, create a (free) account on nvcr.io, then run something like:
docker run -v /tmp:/tmp --privileged --ipc=host --shm-size 100g --cap-add=ALL --gpus all -it nvcr.io/hpc/quantum_espresso:v6.7
The idea of this container is to have an easy way to run QE on most common GPU systems, similar to apt-get, but more optimized (with eg. CUDA-aware MPI).
Thanks.
Louis
From: users <users-bounces at lists.quantum-espresso.org> On Behalf Of Karim Elgammal
Sent: Thursday, April 22, 2021 11:49 PM
To: users at lists.quantum-espresso.org
Subject: [QE-users] QE GPU docker at NVIDIA NGC memory hang
External email: Use caution opening links or attachments
Hi,
I was wondering if anyone here tried the NGC docker image on 1 card and it worked?
it keeps giving the error message: 0: ALLOCATE: 14730730704 bytes requested; status = 2(out of memory)
even specifying more OMP num threads, increasing i/o, decreasing k pts
it only works though on very small systems, the supplied system of ausurf doesn't work through
it will be great if the docker owner can provide the config line as well!
--
Thank you and Best Regards;
Yours;
Karim Elgammal
researcher
KTH
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210427/42a17cd6/attachment.html>
More information about the users
mailing list