[QE-users] [QE-GPU] Performance of the NGC Container

Fri Jul 23 23:17:56 CEST 2021

Hi Louis,

I posted the input file for the system that runs slower on the GPU node versus CPU node:
https://github.com/jdh4/qe_container

Can you tell by looking if that system should run slower on the GPU?

For using the local MPI libraries, I had the "Bind model" in mind:
https://sylabs.io/guides/3.7/user-guide/mpi.html

I guess that would put too much of a burden on the end users. Performance for the AUSURF112 benchmark is excellent with the container so no performance complaints here.

Jon
________________________________
From: users <users-bounces at lists.quantum-espresso.org> on behalf of Louis Stuber via users <users at lists.quantum-espresso.org>
Sent: Friday, July 16, 2021 12:10 PM
To: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] [QE-GPU] Performance of the NGC Container

Hi Jonathan,

Thanks for your message and apologies for the late reply, as Paolo mentioned, the GPU version should never be slower than the CPU one except if it calls routines which are not implemented (fortunately the one you talked about has been implemented recently).

  *   CUDA-aware MPI is nice. It appears that the container is configured to use the MPI libraries in the container instead of those installed for the local cluster. Is this true? Can users take advantage of their local CUDA-aware MPI libraries?

Yes, a container will almost never see/use what’s on your local cluster except for low-level drivers/kernel things . It is not possible to use your own MPI installation without rebuilding the container, however, the container that was uploaded on NGC already uses CUDA-aware MPI iirc so it should already perform well in that regard.

Best,

Louis

From: users <users-bounces at lists.quantum-espresso.org> On Behalf Of Paolo Giannozzi
Sent: Tuesday, July 6, 2021 8:47 PM
To: Quantum ESPRESSO users Forum <users at lists.quantum-espresso.org>
Subject: Re: [QE-users] [QE-GPU] Performance of the NGC Container

External email: Use caution opening links or attachments

The GPU acceleration of DFT-D3, using openacc,  as well as its MPI parallelization, was implemented no more than a few days ago and will appear in the next release (soon). Apparently DFT-D3 takes a non-negligible amount of time. Without MPI parallelization or GPU acceleration, it may easily become a bottleneck when running on many processors, or on GPUs.

Paolo

On Tue, Jul 6, 2021 at 7:44 PM Jonathan D. Halverson <halverson at princeton.edu<mailto:halverson at princeton.edu>> wrote:

Hello (@Louis Stuber),

The QE container on NGC (https://ngc.nvidia.com/catalog/containers/hpc:quantum_espresso<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fngc.nvidia.com%2Fcatalog%2Fcontainers%2Fhpc%3Aquantum_espresso&data=04%7C01%7Clstuber%40nvidia.com%7C588fe1dda6464d020c5608d940ae9ada%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637611942291950905%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=hQAe0gUv1O2b8%2FCdTC29wwPXPH7Y2kaJCGoz1g7ioI4%3D&reserved=0>) appears to be running very well for us on a node with two A100's for the "AUSURF112, Gold surface (112 atoms), DEISA pw" benchmark. We see a speed-up of 8x in comparison to running on 80 Skylake CPU-cores (no GPUs) where the code was built from source.

The procedure we used for the above is here:

https://researchcomputing.princeton.edu/support/knowledge-base/quantum-espresso<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fresearchcomputing.princeton.edu%2Fsupport%2Fknowledge-base%2Fquantum-espresso&data=04%7C01%7Clstuber%40nvidia.com%7C588fe1dda6464d020c5608d940ae9ada%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637611942291960869%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=daY1H57uDzVtMCk455MspB3VaIjabXlKnkTWrONGiEo%3D&reserved=0>

However, for one system we see a slow down (i.e., the code runs faster using only CPU-cores). Can you tell if the system below should perform well using the container?

"My system is basically just two carbon dioxide molecules and doing a single point calculation on them using the PBE-D3 functional and basically just altering the distance between the two molecules in the atomic coordinates."

Can someone comment in general on when one would expect the container running on GPUs to outperform a build-from-source executable running on CPU-cores?

CUDA-aware MPI is nice. It appears that the container is configured to use the MPI libraries in the container instead of those installed for the local cluster. Is this true? Can users take advantage of their local CUDA-aware MPI libraries?

Jon

_______________________________________________
Quantum ESPRESSO is supported by MaX (www.max-centre.eu<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Clstuber%40nvidia.com%7C588fe1dda6464d020c5608d940ae9ada%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637611942291960869%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9R877U5bYrkH2UN%2FB6MtGzc7S9rzmbbuA8UMhEHGUk0%3D&reserved=0>)
users mailing list users at lists.quantum-espresso.org<mailto:users at lists.quantum-espresso.org>
https://lists.quantum-espresso.org/mailman/listinfo/users<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Clstuber%40nvidia.com%7C588fe1dda6464d020c5608d940ae9ada%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637611942291970831%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=y4siAXAjRo1D%2BsTKHF9x%2B1J3RKaDZa%2FyN5%2BAdc0JYqo%3D&reserved=0>

--

Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 206, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20210723/8a783b39/attachment.html>