<div dir="ltr">Hi Ivan & Axel, the computer I did have a slight speed up on was an XE6, but the other computer is not (I'm not sure which company supplied it).  Once I have access to the computer again I will try your suggestions, but as you have said there may not be much of a performance increase, I will probably not spend too much time.  On the computer with Xeon processors, they have a limit of about 64 cores, so I probably wont encounter a communications bottleneck (whereas on the XE6 I can use many more).  Thank you for your time and help.  All the best, Ben.</div>

<div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Nov 9, 2013 at 12:54 PM, Axel Kohlmeyer <span dir="ltr"><<a href="mailto:akohlmey@gmail.com" target="_blank">akohlmey@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Sat, Nov 9, 2013 at 1:14 PM, Ivan Girotto <<a href="mailto:igirotto@ictp.it">igirotto@ictp.it</a>> wrote:<br>


> Dear Ben,<br>

><br>

> I'm afraid you are packing all processes within a node on a same socket<br>

> (-bind-to-socket).<br>

> My recommendation is to use the following: -cpus-per-proc 2 -bind-to-core.<br>

> However, for the pw.x code there is no much expectation to get better<br>

> performance on Intel Xeon arch using MPI+OpenMP till communication becomes a<br>

> serious bottleneck.<br>

> Indeed, the parallel work distribution among MPI processes offers in general<br>

> better scaling.<br>

<br>

</div>... and with current hardware even keeping cores idle can be<br>

beneficial to the overall performance.<br>

<br>

there is a very nice discussion of the various parallelization options<br>

in the QE user's guide. for almost all "normal" machines OpenMP by<br>

itself is inferior to all the other available parallelization options,<br>

so you should exploit those first. exceptions are "unusual" machines<br>

like IBM's bluegene architecture or Cray's XT/XE machines and cases<br>

where you need to parallelize to an extreme number or processors and<br>

at *that* point leaving some cores idle and/or using OpenMP is indeed<br>

helpful to squeeze out a little extra performance.<br>

<br>

ciao,<br>

   axel.<br>

<br>

p.s.: with OpenMP on x86 you should also experiment with the<br>

OMP_WAIT_POLICY environment variable. most OpenMP implementations use<br>

the ACTIVE policy which implies busy waiting and would theoretically<br>

lower the latency, but the alternate PASSIVE policy can be more<br>

efficient, especially when you leave one core per block of threads<br>

idle. remember that on regular machines threads have to compete with<br>

other processes on the machine for access to time slices in the<br>

scheduler. with busy waiting, they are always fully consumed, even if<br>

there is no work been done. calling sched_yield() as is implied by the<br>

PASSIVE mode will quickly release the time slice and lets other<br>

processes do work, which in turn increases the probability that your<br>

thread will be scheduled more quickly again, which in turn can<br>

significantly reduce latencies at implicit or explicit synchronization<br>

points. if all this sounds all greek to you, then you should<br>

definitely follow the advice in the QE user's guide and avoid<br>

OpenMP... ;-)<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

<br>

><br>

> Regards,<br>

><br>

> Ivan<br>

><br>

><br>

> On 08/11/2013 13:45, Ben Palmer wrote:<br>

><br>

> Hi Everyone,<br>

><br>

> (apologies if this has been sent twice)<br>

><br>

> I have compiled QE 5.0.2 on a computer with AMD interlagos processors, using<br>

> the acml, compiling with openmp enabled, and submitting jobs with PBS.  I've<br>

> had a speed up using 2 openmp threads per mpi process.<br>

><br>

> I've been trying to do the same on another computer, that has MOAB as the<br>

> scheduler, E5 series xeon processors (E5-2660) and uses the Intel MKL<br>

> (E5-2660).  I'm pretty sure hyperthreading has been turned off, as each node<br>

> has two sockets and 16 cores in total.<br>

><br>

> I've seen a slow down in performance using OpenMP and MPI, but have read<br>

> that this might be the case in the documentation.  I'm waiting in the<br>

> computer's queue to run the following:<br>

><br>

> #!/bin/bash<br>

> #MOAB -l "nodes=2:ppn=16"<br>

> #MOAB -l "walltime=0:01:00"<br>

> #MOAB -j oe<br>

> #MOAB -N pwscf_calc<br>

> #MOAB -A readmsd02<br>

> #MOAB -q bbtest<br>

> cd "$PBS_O_WORKDIR"<br>

> module load apps/openmpi/v1.6.3/intel-tm-ib/v2013.0.079<br>

> export PATH=$HOME/bin:$PATH<br>

> export OMP_NUM_THREADS=2<br>

> mpiexec -np 16 -x OMP_NUM_THREADS=2 -npernode 8 -bind-to-socket -display-map<br>

> -report-bindings pw_openmp_5.0.2.x -in <a href="http://benchmark2.in" target="_blank">benchmark2.in</a> > benchmark2c.out<br>

><br>

> I just wondered if anyone had any tips on the settings or flags for hybrid<br>

> MPI/OpenMP with the E5 Xeon processors?<br>

><br>

> All the best,<br>

><br>

> Ben Palmer<br>

> Student @ University of Birmingham, UK<br>

><br>

><br>

> _______________________________________________<br>

> Pw_forum mailing list<br>

> <a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>

> <a href="http://pwscf.org/mailman/listinfo/pw_forum" target="_blank">http://pwscf.org/mailman/listinfo/pw_forum</a><br>

><br>

><br>

><br>

> _______________________________________________<br>

> Pw_forum mailing list<br>

> <a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>

> <a href="http://pwscf.org/mailman/listinfo/pw_forum" target="_blank">http://pwscf.org/mailman/listinfo/pw_forum</a><br>

<br>

<br>

<br>

</div></div><span class="HOEnZb"><font color="#888888">--<br>

Dr. Axel Kohlmeyer  <a href="mailto:akohlmey@gmail.com">akohlmey@gmail.com</a>  <a href="http://goo.gl/1wk0" target="_blank">http://goo.gl/1wk0</a><br>

International Centre for Theoretical Physics, Trieste. Italy.<br>

</font></span><div class="HOEnZb"><div class="h5">_______________________________________________<br>

Pw_forum mailing list<br>

<a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>

<a href="http://pwscf.org/mailman/listinfo/pw_forum" target="_blank">http://pwscf.org/mailman/listinfo/pw_forum</a><br>

</div></div></blockquote></div><br></div>