[Pw_forum] OpenMPI problems with espresso 4.0cvs

Javier Antonio Montoya j.antonio.montoya at gmail.com
Mon Apr 28 01:26:20 CEST 2008


Dear  Riccardo,

First of all let me clarify that "-in" is not an option for mpirun but for
espresso itself, i.e. you should type something like

mpirun -np 4 directory/where/espresso/is/pw.x -in file.in > file.out

instead of

mpirun -np 4 directory/where/espresso/is/pw.x < file.in > file.out

That's all.

Second, specifying options like "mpirun -np 4 --host localhost" or "mpirun h
C ..." are not likely to solve your problem.


Now, I don't know how often people explains step by step the configuration
for a particular machine but since I got a 4-core Xeon just one week ago I
will try to post here all the steps that I have made so far (please: execute
the first four steps as root, the symbols ">" that appear starting some of
the lines are just there for signaling the terminal prompt, you don't have
to write them):

1- I downloaded the latest versions of icc, ifort, and mkl (10.1.015,
10.1.015, and 10.0.2.018 respectively) from Intel's web site as a
non-supported version and installed all the packages including idbg (idbg
comes twice, with icc and ifort but it is the same package)

2- I copied the files: ifortvars.sh, iccvars.sh, idbvars.sh, and
mklvarsem64t.sh (all of this if you are using bash as your shell) that come
with the compilers and the mkl libraries, to the directory /etc/profile.d/
(which should already exist assuming that you don't use ubuntu in your Xeon
machine).

3- now, you should download the latest version of LAM/MPI (I found this one
easier to configure than mpich2 in runlevel2, since it does not require a
certain mpd daemon in order to run properly)

Let's say that you unpack your lam/mpi inside a directory named /opt/MPI/
then you will have a directory that looks like : /opt/MPI/lam-7.1.4  .  Now
you can create inside that directory a configure script that looks like:


==================================================================

>vi myscript.sh
#!/bin/sh

cd /opt/MPI/lam-7.1.4
CC=icc
CXX=icpc
FC=ifort

export CC CXX FC

# Lam without gm
./configure --prefix=/opt/MPI/lam-7.1.4/lam --with-rpi=usysv
--without-threads --disable-tv-queue

make
make install

exit

=================================================================


And enable it for execution :

> chmod 755 myscript.sh

This version of the script is the most basic that you can have in order to
compile your lam/mpi, but I do not know what other options could be used in
order to enhance our lam/mpi performance for a Xeon machine and the specific
compilers that I am using. If somebody who knows how to improve on this
posts this information I would be very grateful as well.

execute this by typing:

> ./myscript.sh


IF YOU'VE MANAGED TO MAKE ALL OF THIS TO WORK WITH A DIFFERENT COMPILER YOU
DON"T NEED TO DO IT AGAIN UNLESS YOU THINK THAT INTEL COMPILERS CAN DO A
BETTER JOB.

when the process ends successfully you need to create also a hostfile inside
/opt/MPI//lam-7.1.4/lam/etc/ , go inside there and do:


===================================================================

>vi hostfile
localhost
localhost
localhost
localhost

===================================================================


4- Then, all what you have to do now is to create yet another file inside
/etc/profile.d/ :


===================================================================

>vi lam.sh
#!/bin/sh

export LAM=/opt/MPI/lam-7.1.4/lam

export OMP_NUM_THREADS=1

PATH=/opt/MPI/lam-7.1.4/lam/include:/opt/MPI/lam-7.1.4/lam/bin:$PATH

export PATH

LD_LIBRARY_PATH=/opt/MPI/lam-7.1.4/lam/lib:$LD_LIBRARY_PATH

export LD_LIBRARY_PATH

MANPATH=/opt/MPI/lam-7.1.4/lam/man:$MANPATH

export MANPATH

export LAMRSH=ssh

===================================================================


Note the following:
       -export OMP_NUM_THREADS=1 ,as recently suggested by Axel Kohlmeyer is
there in order to avoid bad behavior from mkl libraries.
       -PATH=/opt/MPI/lam-7.1.4/lam/include without this piece the
espresso's "./configure" command will not properly compile in parallel since
the file "mpif.h" with the definitions is inthere.

        What I observed by running "mpirun -np 4" with a non parallel pw.x
in my machine was, that with the "<" option the code collapsed, while with
the "-in" option the code seems to run but it does the very same calculation
in each core, and therefore, it writes the same output four times inside a
single output file, of course, still using the same time that one single
processor would use.

5- Now, as normal user (logout from root) go and test that your parallel
environment is working (maybe you will want to reboot your machine, or
"source" each one of the "xxx.sh" scripts recently created inside
/etc/profile.d, or open a new terminal)


> laminfo
             LAM/MPI: 7.1.4
              Prefix: /opt/MPI/lam-7.1.4/lam
        Architecture: x86_64-unknown-linux-gnu
       Configured by: root
       Configured on: Sun Apr 27 16:38:22 EDT 2008
      Configure host: GiNUx3
      Memory manager: ptmalloc2
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: yes
          C compiler: icc
        C++ compiler: icpc
    Fortran compiler: ifort
     Fortran symbols: underscore
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: yes
      C++ exceptions: no
      Thread support: no
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI boot: globus (API v1.1, Module v0.6)
            SSI boot: rsh (API v1.1, Module v1.1)
            SSI boot: slurm (API v1.1, Module v1.0)
            SSI coll: lam_basic (API v1.1, Module v7.1)
            SSI coll: shmem (API v1.1, Module v1.0)
            SSI coll: smp (API v1.1, Module v1.2)
             SSI rpi: crtcp (API v1.1, Module v1.1)
             SSI rpi: lamd (API v1.0, Module v7.1)
             SSI rpi: sysv (API v1.0, Module v7.1)
             SSI rpi: tcp (API v1.0, Module v7.1)
             SSI rpi: usysv (API v1.0, Module v7.1)

>lamboot

LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University


>lamnodes
n0      localhost.localdomain:1:origin,this_node

>lamhalt

LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University



NOW ESPRESSO,

6- After having done the previous steps on my machine the configuration of
espresso just worked by doing ./configure , but I had to do the following in
order to make espresso understand that the system is a parallel machine,

as root:

>cp /opt/MPI/lam-7.1.4/lam/bin/mpif77 /opt/MPI/lam-7.1.4/lam/bin/mpif90

In order to make sure that the parallel compiler is named mpif90, mpif77 is
just the default name that lam/mpi uses and it is capable of compiling any
fortran code. The last part of the output of  ./configure  then looks like
this after execution:

--------------------------------------------------------------------
ESPRESSO can take advantage of several optimized numerical libraries
(essl, fftw, mkl...).  This configure script attempts to find them,
but may fail if they have been installed in non-standard locations.
If a required library is not found, the local copy will be compiled.

The following libraries have been found:
  BLAS_LIBS= -lmkl_em64t
  LAPACK_LIBS=  -lmkl_em64t
  FFT_LIBS=
Please check if this is what you expect.

If any libraries are missing, you may specify a list of directories
to search and retry, as follows:
  ./configure LIBDIRS="list of directories, separated by spaces"

Parallel environment detected successfully.
Configured for compilation of parallel executables.

For more info, read the ESPRESSO User's Guide (Doc/users-guide.tex).
--------------------------------------------------------------------
configure: success



7- The make.sys file should NOW contain:

=======================================================

DFLAGS         =  -D__INTEL -D__FFTW -D__USE_INTERNAL_FFTW -D__MPI -D__PARA

MPIF90         = mpif90

LD             = mpif90

=======================================================

Interestingly enough, with this process I can direct input with "<" instead
of using "-in" without getting any error, and then I can run all the
examples in parallel.



CHEERS,

J. A. MONTOYA





On Fri, Apr 25, 2008 at 10:46 AM, Riccardo Sabatini <sabatini at sissa.it>
wrote:

> Hi again,
>
>   i wrote "3 times" meaning that even if i launch mpirun with 4
> processors the error comes out just three times, maybe mpirun launches
> only three instances of the program,sorry for the misunderstanding. I
> tried to launch with the direct location to the file instead the std
> input redirection but nothing still works (i don't usually launch pw
> interactively, was just the pasted snippet to let the mailing list
> understand me).
>
>   I haven't tried anyway the -in flag, seems the openmpi i have
> doesn't understand that flag but i'll look in the help manual to see
> if there's something similar, maybe the problem is there. I'll let you
> know as soon as i try.
>
>   Thanks for the help, regards
>
>     Riccardo
>
> Quoting Axel Kohlmeyer <akohlmey at cmm.chem.upenn.edu>:
>
> > On Fri, 25 Apr 2008, Riccardo Sabatini wrote:
> >
> > RS> Hello everyone,
> >
> > hi riccardo,
> >
> > RS>
> > RS>     i finally compiled espresso with MPI (thanks for the suggestion,
> > RS> with gfortran worked perfectly). I had no problem in the compilation
> > RS> but i can't make it run. I'm trying a super easy run: from the
> > RS> exercise01 the si.scf.cg.in.
> > RS>
> > RS>     Now, if i run the file with the espresso 3.2 compiled without
> mpi
> > RS> obviously runs perfectly but if i try the same file with the mpi
> > RS> version it gives me this error (3 times)
> >
> > doing the same thing 3 times doesn't make it more likely to work...
> >
> > RS>       stopping ...
> > RS>
> >
> ---------------------------------------------------------------------------------
> > RS>
> > RS>      My launch command is (i'm running on a four cores processor
> now)
> > RS>
> > RS>       mpirun -np 4 ../QE-MPI/bin/pw.x < prova.in
> >
> > have you tried the -in flag? not all MPI implementations
> > replicate the input across all nodes and i personally
> > think it is in general a bad idea to read an input from
> > stdin. we don't run anything interactively these days
> > anyways and being able to check file status etc. is
> > a bit advantage.
> >
> > cheers,
> >     axel.
> >
> > RS>      Is there something i'm missing ? Maybe a line do add for
> parallel
> > RS> compilation in the input file ? I've tried the only option in the
> > RS> INPUT_PW about parallel compilation, wf_collect, but nothing
> changes.
> > RS> Since the compilation gave me 0 errors maybe the problem is the
> > RS> combination openMPI+gfotran+espresso-4.0.
> > RS>
> > RS>      Thanks for the help,
> > RS>
> > RS>                 Riccardo
> > RS>
> > RS> ----------------------------------------------------------------
> > RS>    SISSA Webmail https://webmail.sissa.it/
> > RS>    Powered by Horde http://www.horde.org/
> > RS>
> > RS>
> > RS> _______________________________________________
> > RS> Pw_forum mailing list
> > RS> Pw_forum at pwscf.org
> > RS> http://www.democritos.it/mailman/listinfo/pw_forum
> > RS>
> >
> > --
> > =======================================================================
> > Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
> >    Center for Molecular Modeling   --   University of Pennsylvania
> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
> > tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
> > =======================================================================
> > If you make something idiot-proof, the universe creates a better idiot.
> >
>
> ----------------------------------------------------------------
>    SISSA Webmail https://webmail.sissa.it/
>    Powered by Horde http://www.horde.org/
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20080427/0da6666f/attachment.html>


More information about the users mailing list