<div dir="ltr"><div>Dear Fabio Affinito</div><div><br></div>Thank you so much for information. </div><div class="gmail_extra"><br clear="all"><div>"Apologizing does not mean that you are wrong and the other one is right...<br>

It simply means that you value the relationship much more than your ego.." </div>

<br><br><div class="gmail_quote">On Wed, Jul 23, 2014 at 5:32 PM, Nisha Agrawal <span dir="ltr"><<a href="mailto:itlinkstonisha@gmail.com" target="_blank">itlinkstonisha@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir="ltr">Hi, <div><br></div><div>I setup the quantum espresso Intel Xeon Phi version using the instruction provided in the following link</div><div> </div><div><a href="https://software.intel.com/en-us/articles/quantum-espresso-for-intel-xeon-phi-coprocessor" target="_blank">https://software.intel.com/en-us/articles/quantum-espresso-for-intel-xeon-phi-coprocessor</a> </div>


<div><br></div><div>However when I was running, its not getting offloaded to Intel Xeon PHI , following is the script I am using</div><div>to run QE MIC version. Please let me know If I missed something which is required to set or doing somthing</div>


<div>wrong.</div><div><br></div><div>-------------------------------------------------------------------------------------------------------------------------------------------------------</div><div><div>source /home/opt/ICS-2013.1.039-intel64/bin/compilervars.sh intel64</div>


<div>source /home/opt/ICS-2013.1.039-intel64/mkl/bin/mklvars.sh intel64</div><div>source /home/opt/ICS-2013.1.039-intel64/impi/<a href="http://4.1.2.040/bin64/mpivars.sh" target="_blank">4.1.2.040/bin64/mpivars.sh</a></div>

</div><div><br>

</div><div><div><div>export MKL_MIC_ENABLE=1</div><div>export MKL_DYNAMIC=false</div><div>export MKL_MIC_DISABLE_HOST_FALLBACK=1</div><div>export MIC_LD_LIBRARY_PATH=$MKLROOT/lib/mic:$MIC_LD_LIBRARY_PATH</div></div></div>


<div><br></div><div>export OFFLOAD_DEVICES=0<br></div><div><br></div><div><div>export I_MPI_FALLBACK_DEVICE=disable</div><div>export I_MPI_PIN=disable</div><div>export I_MPI_DEBUG=5</div><div><br></div><div><br></div><div>


export MKL_MIC_ZGEMM_AA_M_MIN=500</div><div>export MKL_MIC_ZGEMM_AA_N_MIN=500</div><div>export MKL_MIC_ZGEMM_AA_K_MIN=500</div><div>export MKL_MIC_THRESHOLDS_ZGEMM=500,500,500</div><div><br></div><div><br></div><div>export OFFLOAD_REPORT=2</div>


</div><div>mpirun  -np 8 -perhost 4  ./espresso-5.0.2/bin/pw.x   -in  ./BN.in 2>&1 | tee test.log<br></div><div><br></div><div>---------------------------------------------------------------------</div><div>------------------------------------------------------------------------------- </div>


<div><br></div><div class="gmail_extra"><br clear="all"><div>"Apologizing does not mean that you are wrong and the other one is right...<br>It simply means that you value the relationship much more than your ego.." </div>


<br><br><div class="gmail_quote">On Mon, Jul 14, 2014 at 8:16 PM, Axel Kohlmeyer <span dir="ltr"><<a href="mailto:akohlmey@gmail.com" target="_blank">akohlmey@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>On Mon, Jul 14, 2014 at 9:34 AM, Eduardo Menendez <<a href="mailto:eariel99@gmail.com" target="_blank">eariel99@gmail.com</a>> wrote:<br>

> Thank you Axel. Your advise rises another doubt. Can we get the maximum<br>

> performance from a highly clocked CPU?<br>

> I used to consider that the the fastest CPUs were too fast for the memory<br>

> access, resulting in bottlenecks. Of couse it depends on cache size.<br>

<br>

</div>your concern is justified, but the situation is more complex these<br>

days. highly clocked CPUs have less cores and thus receive a larger<br>

share of the available memory bandwidth and the highest clocked<br>

inter-CPU and memory bus is only available for a subset of the CPUs.<br>

now you have an optimization problem that has to consider the strong<br>

scaling (or lack thereof) of the code in question as an additional<br>

input parameter.<br>

<br>

to give an example: we purchased at the same time dual socket nodes<br>

that had the same mainboard, but either 2x 3.5GHz quad-core or 2x<br>

2.8GHz hex-core. the 3.5GHz was the fastest clock available at the<br>

time. for classical MD, i get better performance out of the 12-core<br>

nodes, for plane-wave DFT i get about the same performance out of<br>

both, for CP2k i get better performance with the 8-core (in fact, CP2k<br>

runs fastest on the 12-core with using only 8 cores). now, the cost of<br>

the 2.8GHz CPUs is significantly lower, so that is why we procured the<br>

majority of the cluster with those. but we do have applications that<br>

scale less than CP2k or are serial, but require high per-core memory<br>

bandwidth, so we got a few of the 3.5GHz ones, too (and since they are<br>

already expensive we filled them with RAM as much as it doesn't result<br>

in underclocking of the memory bus; and in turn we put "only" 1GB/core<br>

into the 12-core nodes).<br>

<br>

so it all boils down to finding the right balance and adjusting it to<br>

the application mix that you are running. last time i checked the<br>

intel spec sheets, it looked as if the best deal was to be had for<br>

CPUs with the second largest number of CPU cores and as high a clock<br>

as required to have the full memory bus speed. that will also keep the<br>

heat in check, as the highest clocked CPUs usually have a much higher<br>

TDP (>50% more) and that is just a much larger demand on cooling and<br>

power and will incur additional indirect costs as well.<br>

<br>

HTH,<br>

    axel.<br>

<div><br>

<br>

><br>

>>Stick with the cpu. For QE you should be best off with intel. Also you are<br>

>> likely to >get the best price/performance ratio with CPUs that have less<br>

>> than the maximum >number of cpu cores and a higher clock instead.<br>

><br>

><br>

> Eduardo Menendez Proupin<br>

> Departamento de Fisica, Facultad de Ciencias, Universidad de Chile<br>

> URL: <a href="http://www.gnm.cl/emenendez" target="_blank">http://www.gnm.cl/emenendez</a><br>

><br>

> “Science may be described as the art of systematic oversimplification.” Karl<br>

> Popper<br>

><br>

><br>

</div><div>> _______________________________________________<br>

> Pw_forum mailing list<br>

> <a href="mailto:Pw_forum@pwscf.org" target="_blank">Pw_forum@pwscf.org</a><br>

> <a href="http://pwscf.org/mailman/listinfo/pw_forum" target="_blank">http://pwscf.org/mailman/listinfo/pw_forum</a><br>

<br>

<br><span class="HOEnZb"><font color="#888888">

<br>

</font></span></div><span class="HOEnZb"><font color="#888888"><span><font color="#888888">--<br>

Dr. Axel Kohlmeyer  <a href="mailto:akohlmey@gmail.com" target="_blank">akohlmey@gmail.com</a>  <a href="http://goo.gl/1wk0" target="_blank">http://goo.gl/1wk0</a><br>

College of Science & Technology, Temple University, Philadelphia PA, USA<br>

International Centre for Theoretical Physics, Trieste. Italy.<br>

</font></span><div><div><br>

_______________________________________________<br>

Pw_forum mailing list<br>

<a href="mailto:Pw_forum@pwscf.org" target="_blank">Pw_forum@pwscf.org</a><br>

<a href="http://pwscf.org/mailman/listinfo/pw_forum" target="_blank">http://pwscf.org/mailman/listinfo/pw_forum</a></div></div></font></span></blockquote></div><br></div></div>

</blockquote></div><br></div>