[Pw_forum] Hybrid MPI/OpenMPI

Sat Nov 9 13:54:38 CET 2013

On Sat, Nov 9, 2013 at 1:14 PM, Ivan Girotto <igirotto at ictp.it> wrote:
> Dear Ben,
>
> I'm afraid you are packing all processes within a node on a same socket
> (-bind-to-socket).
> My recommendation is to use the following: -cpus-per-proc 2 -bind-to-core.
> However, for the pw.x code there is no much expectation to get better
> performance on Intel Xeon arch using MPI+OpenMP till communication becomes a
> serious bottleneck.
> Indeed, the parallel work distribution among MPI processes offers in general
> better scaling.

... and with current hardware even keeping cores idle can be
beneficial to the overall performance.

there is a very nice discussion of the various parallelization options
in the QE user's guide. for almost all "normal" machines OpenMP by
itself is inferior to all the other available parallelization options,
so you should exploit those first. exceptions are "unusual" machines
like IBM's bluegene architecture or Cray's XT/XE machines and cases
where you need to parallelize to an extreme number or processors and
at *that* point leaving some cores idle and/or using OpenMP is indeed
helpful to squeeze out a little extra performance.

ciao,
   axel.

p.s.: with OpenMP on x86 you should also experiment with the
OMP_WAIT_POLICY environment variable. most OpenMP implementations use
the ACTIVE policy which implies busy waiting and would theoretically
lower the latency, but the alternate PASSIVE policy can be more
efficient, especially when you leave one core per block of threads
idle. remember that on regular machines threads have to compete with
other processes on the machine for access to time slices in the
scheduler. with busy waiting, they are always fully consumed, even if
there is no work been done. calling sched_yield() as is implied by the
PASSIVE mode will quickly release the time slice and lets other
processes do work, which in turn increases the probability that your
thread will be scheduled more quickly again, which in turn can
significantly reduce latencies at implicit or explicit synchronization
points. if all this sounds all greek to you, then you should
definitely follow the advice in the QE user's guide and avoid
OpenMP... ;-)

>
> Regards,
>
> Ivan
>
>
> On 08/11/2013 13:45, Ben Palmer wrote:
>
> Hi Everyone,
>
> (apologies if this has been sent twice)
>
> I have compiled QE 5.0.2 on a computer with AMD interlagos processors, using
> the acml, compiling with openmp enabled, and submitting jobs with PBS.  I've
> had a speed up using 2 openmp threads per mpi process.
>
> I've been trying to do the same on another computer, that has MOAB as the
> scheduler, E5 series xeon processors (E5-2660) and uses the Intel MKL
> (E5-2660).  I'm pretty sure hyperthreading has been turned off, as each node
> has two sockets and 16 cores in total.
>
> I've seen a slow down in performance using OpenMP and MPI, but have read
> that this might be the case in the documentation.  I'm waiting in the
> computer's queue to run the following:
>
> #!/bin/bash
> #MOAB -l "nodes=2:ppn=16"
> #MOAB -l "walltime=0:01:00"
> #MOAB -j oe
> #MOAB -N pwscf_calc
> #MOAB -A readmsd02
> #MOAB -q bbtest
> cd "$PBS_O_WORKDIR"
> module load apps/openmpi/v1.6.3/intel-tm-ib/v2013.0.079
> export PATH=$HOME/bin:$PATH
> export OMP_NUM_THREADS=2
> mpiexec -np 16 -x OMP_NUM_THREADS=2 -npernode 8 -bind-to-socket -display-map
> -report-bindings pw_openmp_5.0.2.x -in benchmark2.in > benchmark2c.out
>
> I just wondered if anyone had any tips on the settings or flags for hybrid
> MPI/OpenMP with the E5 Xeon processors?
>
> All the best,
>
> Ben Palmer
> Student @ University of Birmingham, UK
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum

-- 
Dr. Axel Kohlmeyer  akohlmey at gmail.com  http://goo.gl/1wk0
International Centre for Theoretical Physics, Trieste. Italy.