[Pw_forum] [Pw_Forum] Parallel execution and configuration issues

Kirk khrusallis at gmail.com
Tue Sep 20 18:09:37 CEST 2016


Hello Nicola

I am very interested in that aiida workflow.

Thx much for the offer to share it.

Kirk

On Sep 20, 2016 4:05 AM, "nicola varini" <nicola.varini at epfl.ch> wrote:

> Dear Konrad, as Alex suggested the MPI only configuration might be the
> best in your case.
> However, if you would like to run on bigger machines or newer hardware
> like KNL processor it's important to understand how OpenMP performs as
> well.
> To this end, by using the AiiDA software(http://www.aiida.net/), I
> created a workflow that do a series
> of benchmark and compare the performance of  different combination
> MPI+OpenMP by splitting
> the communicators following a certain logic.
> At the moment this is not present in the AiiDA release, however if there
> is any interest I am more than
> happy to share.
>
> Nicola
>
>
> On 20.09.2016 00:25, Konrad Gruszka wrote:
> > Dear Stefano and Axel,
> >
> > Thank you very much for your answers. In fact, the 'silly' mistake was
> > in confusing OpenMP with OpenMPI! (just like Axel suggested) - now it
> > seems so clear.
> > I'll now stick with OpenMPI or Hydra only (both did work fine within one
> > machine) and just try to configure it properly so it can 'see' both
> hosts.
> >
> > Sometimes just few words from experts can do real magic.
> >
> > Konrad.
> >
> > W dniu 2016-09-19 o 20:17, Axel Kohlmeyer pisze:
> >> On Mon, Sep 19, 2016 at 6:01 AM, Konrad Gruszka <kgruszka at wip.pcz.pl>
> wrote:
> >>> Dear colleagues,
> >>>
> >>> I'm Writing to our community to clarify once and for all issues with
> >>> parallel pwscf execution.
> >>>
> >>> By lecture of PWSCF manual, I know that there are several levels of
> >>> parallelization, including FFTW and more. At the same time, I'm quite
> sure
> >>> that most of beginner QE users do not have complicated clusters, but
> are
> >>> rather starting with decent multicore PC's.
> >>>
> >>> There are two main parallel mechanisms (known to me) namely:  MPI and
> >>> OpenMP. Because I'm using Ubuntu, there is free Hydra MPI software
> (mpich)
> >> as far as i know, ubuntu ships with *two* MPI package OpenMPI (not
> >> OpenMP) and MPICH and OpenMPI is the default MPI package.
> >> OpenMP is not a specific package, but a compiler feature (and included
> in GCC).
> >>
> >>> and also free OpenMP. From what I know (please correct me if this is
> wrong)
> >>> MPI software is used for parallelization within one multicore PC and
> OpenMP
> >>> is used to make parallelization between several PC's.
> >> that is incorrect. OpenMP is *restricted* to multi-core CPUs, because
> >> it is based on multi-threading.
> >> MPI (i.e. either OpenMPI or MPICH, but not both at the same time) can
> >> be used for intra-node or inter-node or a combination of both.
> >> in many cases, using only MPI is the most effective parallelization
> >> approach. only when you have many cores per node and communication
> >> contention become a performance limiting factor, you will run faster
> >> with MPI+OpenMP
> >>
> >>
> >>> So my reasoning is: because I have 2 multicore PCs (namely PC1: 6
> cores 2
> >>> threads, and PC2: 4 cores 2 threads, giving in overall PC1: 12 and PC2
> 8
> >>> 'cores') on both PC1 and PC2 I've installed hydra MPI (for
> parallelization
> >>> within multicores) and on both there are OpenMP for communication
> between
> >>> PC1 and PC2 to act as 'cluster'.
> >> you seem to be confusing OpenMPI with OpenMP. for your system, an
> >> MPI-only configuration (using the default MPI, i.e. OpenMPI) should
> >> work well. no need to worry about the additional complications from
> >> OpenMP until you have a larger, more potent machine.
> >>
> >>> I've configured OpenMP so both machines sees one another and when I'm
> >>> starting some pw.x job using openmp on both machines i see that
> processors
> >>> are 100% busy. Problem is that in produced output files I can see
> several
> >>> repeats of pwscf output (just like several separate pwscf processes
> write
> >>> into one file at the same time).
> >> that is an indication of compiling executable without MPI support or
> >> with support for an MPI library *different* from the one used when
> >> launching the executable.
> >>
> >>> The QE was configured and compiled with --enable-openmp on both
> machines and
> >>> when I simply run pw.x on PC1 it writes:
> >>>
> >>>     "Parallel version (MPI & OpenMP), running on      12 processor
> cores"
> >>>
> >>> I'm running job with this command:
> >>>
> >>>    mpirun.openmp -np 20 pw.x -in job.in > job.out
> >>>
> >>> and as said before, both machines are occupied with this job (tested
> using
> >>> htop).
> >>>
> >>> Looking for help in this topic I often get to the truth 'please
> contact your
> >>> system administrator', but sadly I am the administrator!
> >> then look for somebody with experience in linux clusters. you
> >> definitely want somebody local to help you, that can look over your
> >> shoulder while you are doing things and explain stuff. this is really
> >> tricky to do over e-mail. these days it should not be difficult to
> >> find somebody, as using linux in parallel is ubiquitous in most
> >> computational science groups.
> >>
> >> there also is a *gazillion* of material available online including
> >> talk slides from workshops and courses and online courses and
> >> self-study material. HPC university has an aggregator for such
> >> material at:
> >> http://hpcuniversity.org/trainingMaterials/
> >>
> >> axel.
> >>
> >>> Could some one with a similar hardware configuration could comment
> here how
> >>> to achieve properly working cluster?
> >>>
> >>> Sincerely yours
> >>>
> >>> Konrad Gruszka
> >>>
> >>>
> >>> _______________________________________________
> >>> Pw_forum mailing list
> >>> Pw_forum at pwscf.org
> >>> http://pwscf.org/mailman/listinfo/pw_forum
> >>
>
> --
> Nicola Varini, PhD
>
> Scientific IT and Application Support (SCITAS)
> Theory and simulation of materials (THEOS)
> CE 0 813 (Bâtiment CE)
> Station 1
> CH-1015 Lausanne
> http://scitas.epfl.ch
>
> Nicola Varini
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20160920/968dd30c/attachment.html>


More information about the users mailing list