[Pw_forum] Hybrid MPI/OpenMPI

Mon Nov 11 00:52:16 CET 2013

Hi Ivan & Axel, the computer I did have a slight speed up on was an XE6,
but the other computer is not (I'm not sure which company supplied it).
 Once I have access to the computer again I will try your suggestions, but
as you have said there may not be much of a performance increase, I will
probably not spend too much time.  On the computer with Xeon processors,
they have a limit of about 64 cores, so I probably wont encounter a
communications bottleneck (whereas on the XE6 I can use many more).  Thank
you for your time and help.  All the best, Ben.

On Sat, Nov 9, 2013 at 12:54 PM, Axel Kohlmeyer <akohlmey at gmail.com> wrote:

> On Sat, Nov 9, 2013 at 1:14 PM, Ivan Girotto <igirotto at ictp.it> wrote:
> > Dear Ben,
> >
> > I'm afraid you are packing all processes within a node on a same socket
> > (-bind-to-socket).
> > My recommendation is to use the following: -cpus-per-proc 2
> -bind-to-core.
> > However, for the pw.x code there is no much expectation to get better
> > performance on Intel Xeon arch using MPI+OpenMP till communication
> becomes a
> > serious bottleneck.
> > Indeed, the parallel work distribution among MPI processes offers in
> general
> > better scaling.
>
> ... and with current hardware even keeping cores idle can be
> beneficial to the overall performance.
>
> there is a very nice discussion of the various parallelization options
> in the QE user's guide. for almost all "normal" machines OpenMP by
> itself is inferior to all the other available parallelization options,
> so you should exploit those first. exceptions are "unusual" machines
> like IBM's bluegene architecture or Cray's XT/XE machines and cases
> where you need to parallelize to an extreme number or processors and
> at *that* point leaving some cores idle and/or using OpenMP is indeed
> helpful to squeeze out a little extra performance.
>
> ciao,
>    axel.
>
> p.s.: with OpenMP on x86 you should also experiment with the
> OMP_WAIT_POLICY environment variable. most OpenMP implementations use
> the ACTIVE policy which implies busy waiting and would theoretically
> lower the latency, but the alternate PASSIVE policy can be more
> efficient, especially when you leave one core per block of threads
> idle. remember that on regular machines threads have to compete with
> other processes on the machine for access to time slices in the
> scheduler. with busy waiting, they are always fully consumed, even if
> there is no work been done. calling sched_yield() as is implied by the
> PASSIVE mode will quickly release the time slice and lets other
> processes do work, which in turn increases the probability that your
> thread will be scheduled more quickly again, which in turn can
> significantly reduce latencies at implicit or explicit synchronization
> points. if all this sounds all greek to you, then you should
> definitely follow the advice in the QE user's guide and avoid
> OpenMP... ;-)
>
>
>
> >
> > Regards,
> >
> > Ivan
> >
> >
> > On 08/11/2013 13:45, Ben Palmer wrote:
> >
> > Hi Everyone,
> >
> > (apologies if this has been sent twice)
> >
> > I have compiled QE 5.0.2 on a computer with AMD interlagos processors,
> using
> > the acml, compiling with openmp enabled, and submitting jobs with PBS.
>  I've
> > had a speed up using 2 openmp threads per mpi process.
> >
> > I've been trying to do the same on another computer, that has MOAB as the
> > scheduler, E5 series xeon processors (E5-2660) and uses the Intel MKL
> > (E5-2660).  I'm pretty sure hyperthreading has been turned off, as each
> node
> > has two sockets and 16 cores in total.
> >
> > I've seen a slow down in performance using OpenMP and MPI, but have read
> > that this might be the case in the documentation.  I'm waiting in the
> > computer's queue to run the following:
> >
> > #!/bin/bash
> > #MOAB -l "nodes=2:ppn=16"
> > #MOAB -l "walltime=0:01:00"
> > #MOAB -j oe
> > #MOAB -N pwscf_calc
> > #MOAB -A readmsd02
> > #MOAB -q bbtest
> > cd "$PBS_O_WORKDIR"
> > module load apps/openmpi/v1.6.3/intel-tm-ib/v2013.0.079
> > export PATH=$HOME/bin:$PATH
> > export OMP_NUM_THREADS=2
> > mpiexec -np 16 -x OMP_NUM_THREADS=2 -npernode 8 -bind-to-socket
> -display-map
> > -report-bindings pw_openmp_5.0.2.x -in benchmark2.in > benchmark2c.out
> >
> > I just wondered if anyone had any tips on the settings or flags for
> hybrid
> > MPI/OpenMP with the E5 Xeon processors?
> >
> > All the best,
> >
> > Ben Palmer
> > Student @ University of Birmingham, UK
> >
> >
> > _______________________________________________
> > Pw_forum mailing list
> > Pw_forum at pwscf.org
> > http://pwscf.org/mailman/listinfo/pw_forum
> >
> >
> >
> > _______________________________________________
> > Pw_forum mailing list
> > Pw_forum at pwscf.org
> > http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
> --
> Dr. Axel Kohlmeyer  akohlmey at gmail.com  http://goo.gl/1wk0
> International Centre for Theoretical Physics, Trieste. Italy.
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20131110/1ff1d72f/attachment.html>