<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:Wingdings;

        panose-1:5 0 0 0 0 0 0 0 0 0;}

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Verdana;

        panose-1:2 11 6 4 3 5 4 4 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph

        {mso-style-priority:34;

        margin-top:0in;

        margin-right:0in;

        margin-bottom:0in;

        margin-left:.5in;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

span.EmailStyle18

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

/* List Definitions */

@list l0

        {mso-list-id:1599871973;

        mso-list-type:hybrid;

        mso-list-template-ids:403977440 -811166000 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}

@list l0:level1

        {mso-level-start-at:10;

        mso-level-number-format:bullet;

        mso-level-text:\F0D8;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;

        mso-ansi-font-size:12.0pt;

        font-family:Wingdings;

        mso-fareast-font-family:"Times New Roman";

        mso-bidi-font-family:Calibri;

        color:black;}

@list l0:level2

        {mso-level-number-format:bullet;

        mso-level-text:o;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;

        font-family:"Courier New";}

@list l0:level3

        {mso-level-number-format:bullet;

        mso-level-text:\F0A7;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;

        font-family:Wingdings;}

@list l0:level4

        {mso-level-number-format:bullet;

        mso-level-text:\F0B7;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;

        font-family:Symbol;}

@list l0:level5

        {mso-level-number-format:bullet;

        mso-level-text:o;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;

        font-family:"Courier New";}

@list l0:level6

        {mso-level-number-format:bullet;

        mso-level-text:\F0A7;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;

        font-family:Wingdings;}

@list l0:level7

        {mso-level-number-format:bullet;

        mso-level-text:\F0B7;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;

        font-family:Symbol;}

@list l0:level8

        {mso-level-number-format:bullet;

        mso-level-text:o;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;

        font-family:"Courier New";}

@list l0:level9

        {mso-level-number-format:bullet;

        mso-level-text:\F0A7;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;

        font-family:Wingdings;}

ol

        {margin-bottom:0in;}

ul

        {margin-bottom:0in;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

</head>

<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">

<div class="WordSection1">

<p class="MsoNormal">Hi Jonathan,<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">Thanks for your message and apologies for the late reply, as Paolo mentioned, the GPU version should never be slower than the CPU one except if it calls routines which are not implemented (fortunately the one you talked about has been implemented

 recently).<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<ul style="margin-top:0in" type="disc">

<li class="MsoListParagraph" style="color:#201F1E;margin-left:0in;mso-list:l0 level1 lfo1">

<span style="font-size:12.0pt;color:black">CUDA-aware MPI is nice. It appears that the container is configured to use the MPI libraries in the container instead of those installed for the local cluster. Is this true? Can users take advantage of their local

 CUDA-aware MPI libraries?</span><span style="font-size:11.5pt"><o:p></o:p></span></li></ul>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">Yes, a container will almost never see/use what’s on your local cluster except for low-level drivers/kernel things . It is not possible to use your own MPI installation without rebuilding the container, however, the container that was uploaded

 on NGC already uses CUDA-aware MPI iirc so it should already perform well in that regard.<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">Best,<o:p></o:p></p>

<p class="MsoNormal">Louis<o:p></o:p></p>

<div>

<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">

<p class="MsoNormal"><b>From:</b> users <users-bounces@lists.quantum-espresso.org>

<b>On Behalf Of </b>Paolo Giannozzi<br>

<b>Sent:</b> Tuesday, July 6, 2021 8:47 PM<br>

<b>To:</b> Quantum ESPRESSO users Forum <users@lists.quantum-espresso.org><br>

<b>Subject:</b> Re: [QE-users] [QE-GPU] Performance of the NGC Container<o:p></o:p></p>

</div>

</div>

<p class="MsoNormal"><o:p> </o:p></p>

<table class="MsoNormalTable" border="1" cellspacing="5" cellpadding="0" style="background:#FFEB9C">

<tbody>

<tr>

<td style="padding:.75pt .75pt .75pt .75pt">

<p class="MsoNormal"><b><span style="font-size:7.5pt;font-family:"Verdana",sans-serif;color:black">External email: Use caution opening links or attachments</span></b><span style="font-size:7.5pt;font-family:"Verdana",sans-serif;color:black">

</span><o:p></o:p></p>

</td>

</tr>

</tbody>

</table>

<p class="MsoNormal"><o:p> </o:p></p>

<div>

<div>

<div>

<p class="MsoNormal">The GPU acceleration of DFT-D3, using openacc,  as well as its MPI parallelization, was implemented no more than a few days ago and will appear in the next release (soon). Apparently DFT-D3 takes a non-negligible amount of time. Without

 MPI parallelization or GPU acceleration, it may easily become a bottleneck when running on many processors, or on GPUs.<o:p></o:p></p>

</div>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<div>

<p class="MsoNormal">Paolo<o:p></o:p></p>

</div>

</div>

<p class="MsoNormal"><o:p> </o:p></p>

<div>

<div>

<p class="MsoNormal">On Tue, Jul 6, 2021 at 7:44 PM Jonathan D. Halverson <<a href="mailto:halverson@princeton.edu">halverson@princeton.edu</a>> wrote:<o:p></o:p></p>

</div>

<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">

<div>

<div>

<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Hello (@Louis Stuber),<o:p></o:p></span></p>

</div>

<div>

<div>

<div>

<p class="MsoNormal"><span style="font-size:11.5pt;color:#201F1E"><o:p> </o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:12.0pt;color:black">The QE container on NGC (<a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fngc.nvidia.com%2Fcatalog%2Fcontainers%2Fhpc%3Aquantum_espresso&data=04%7C01%7Clstuber%40nvidia.com%7C588fe1dda6464d020c5608d940ae9ada%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637611942291950905%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=hQAe0gUv1O2b8%2FCdTC29wwPXPH7Y2kaJCGoz1g7ioI4%3D&reserved=0" target="_blank">https://ngc.nvidia.com/catalog/containers/hpc:quantum_espresso</a>) appears

 to be running very well for us on a node with two A100's for the "AUSURF112, Gold surface (112 atoms), DEISA pw" benchmark. We see a speed-up of 8x in comparison to running on 80 Skylake CPU-cores (no GPUs) where the code was built from source.</span><span style="font-size:11.5pt;color:#201F1E"><o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:11.5pt;color:#201F1E"><o:p> </o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:12.0pt;color:black">The procedure we used for the above is here:</span><span style="font-size:11.5pt;color:#201F1E"><o:p></o:p></span></p>

<div>

<p class="MsoNormal"><span style="font-size:11.5pt;color:#201F1E"><a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fresearchcomputing.princeton.edu%2Fsupport%2Fknowledge-base%2Fquantum-espresso&data=04%7C01%7Clstuber%40nvidia.com%7C588fe1dda6464d020c5608d940ae9ada%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637611942291960869%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=daY1H57uDzVtMCk455MspB3VaIjabXlKnkTWrONGiEo%3D&reserved=0" target="_blank"><span style="font-size:12.0pt;color:black">https://researchcomputing.princeton.edu/support/knowledge-base/quantum-espresso</span></a><o:p></o:p></span></p>

</div>

</div>

<div>

<p class="MsoNormal"><span style="font-size:11.5pt;color:#201F1E"><o:p> </o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:12.0pt;color:black">However, for one system we see a slow down (i.e., the code runs faster using only CPU-cores). Can you tell if the system below should perform well using the container?</span><span style="font-size:11.5pt;color:#201F1E"><o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:11.5pt;color:#201F1E"><o:p> </o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:12.0pt;color:black">"My system is basically just two carbon dioxide molecules and doing a single point calculation on them using the PBE-D3 functional and basically just altering the distance between the two molecules

 in the atomic coordinates."</span><span style="font-size:11.5pt;color:#201F1E"><o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:11.5pt;color:#201F1E"><o:p> </o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Can someone comment in general on when one would expect the container running on GPUs to outperform a build-from-source executable running on CPU-cores?</span><span style="font-size:11.5pt;color:#201F1E"><o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:11.5pt;color:#201F1E"><o:p> </o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:12.0pt;color:black">CUDA-aware MPI is nice. It appears that the container is configured to use the MPI libraries in the container instead of those installed for the local cluster. Is this true? Can users take advantage

 of their local CUDA-aware MPI libraries?</span><span style="font-size:11.5pt;color:#201F1E"><o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal"><span style="font-size:11.5pt;color:#201F1E"><o:p> </o:p></span></p>

</div>

<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Jon<o:p></o:p></span></p>

</div>

</div>

</div>

<p class="MsoNormal">_______________________________________________<br>

Quantum ESPRESSO is supported by MaX (<a href="https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.max-centre.eu%2F&data=04%7C01%7Clstuber%40nvidia.com%7C588fe1dda6464d020c5608d940ae9ada%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637611942291960869%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=9R877U5bYrkH2UN%2FB6MtGzc7S9rzmbbuA8UMhEHGUk0%3D&reserved=0" target="_blank">www.max-centre.eu</a>)<br>

users mailing list <a href="mailto:users@lists.quantum-espresso.org" target="_blank">

users@lists.quantum-espresso.org</a><br>

<a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.quantum-espresso.org%2Fmailman%2Flistinfo%2Fusers&data=04%7C01%7Clstuber%40nvidia.com%7C588fe1dda6464d020c5608d940ae9ada%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637611942291970831%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=y4siAXAjRo1D%2BsTKHF9x%2B1J3RKaDZa%2FyN5%2BAdc0JYqo%3D&reserved=0" target="_blank">https://lists.quantum-espresso.org/mailman/listinfo/users</a><o:p></o:p></p>

</blockquote>

</div>

<p class="MsoNormal"><br clear="all">

<br>

-- <o:p></o:p></p>

<div>

<div>

<div>

<div>

<div>

<p class="MsoNormal" style="margin-bottom:12.0pt">Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,<br>

Univ. Udine, via delle Scienze 206, 33100 Udine, Italy<br>

Phone +39-0432-558216, fax +39-0432-558222<o:p></o:p></p>

</div>

</div>

</div>

</div>

</div>

</div>

</div>

</body>

</html>