<div dir="ltr"><div class="gmail_default" style="font-family:times new roman,serif">Dear users and developers<br><br>Currently I am using two Tesla K40m cards for my computational work on quantum espresso (QE).

 My GPU enabled QE code running very slower than normal version. My 

question was weather particular application will be fast only in some 

versions of CUDA toolkit? (as mentioned in previous post: <a href="http://qe-forge.org/pipermail/pw_forum/2015-May/106889.html">http://qe-forge.org/pipermail/pw_forum/2015-May/106889.html</a>) OR is there any other reason hindering 

performance (memory) of GPU? (when

 I am hitting top command in my server, option of 'VIRT' showing 

different values (top command pasted in 

attached file))<br>

<br>Some error was generating while submitting code that "A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib)  Host: XXXX Another transport will be used instead, although this may result in lower performance".  Is this MPI thread hindering GPU performance ?</div><div class="gmail_default" style="font-family:times new roman,serif"><br>

(P.S: We don't have any Infiband adapter HCA in server)<br>

<br><br>

Current details of server are (full details attached):<br>

<br>

Server: FUJITSU PRIMERGY RX2540 M2<br>

CUDA version: 9.0<br>NVIDIA driver: 384.9<br>

openmpi version: 2.0.4 with intel mkl libraries <br>

QE-gpu version : 5.4.0<br>

<br>

<br>

Thanks in advance<br></div><div class="gmail_default" style="font-family:times new roman,serif">

<br></div><div class="gmail_default" style="font-family:times new roman,serif">Regards<br></div><div class="gmail_default" style="font-family:times new roman,serif">Phanikumar<br></div></div>