[Pw_forum] problem about MPI

mbaris at metu.edu.tr mbaris at metu.edu.tr
Tue Jun 5 07:46:53 CEST 2007


Dear Bo Peng,

doing a quick google dig about the problem, I come up with
http://www.myri.com/serve/cache/236.html

that is, your secondary script for generating a dynamic port number 
used in mpi intercommunication is not working. This is most probably an 
installation issue
regarding LSF, or do you have to call such a script yourself in the 
submit batch file?

in the second case, are you sure you are using myrinet version of the 
mpi? (I suppose you are on myrinet, as the first error suggests) And 
besides that, it always is a very bad idea to try to bypass tight 
integration of grid engines, often admins try to terminate runaway jobs 
in nodes automatically. Also I am not sure you are bypassing it 
correctly in that script, are you calling mpirun two times?

I think, this problem is not related by espresso, rather it is a LSF 
issue on your site.

If you are looking for an alternative, may I suggest sun grid engine?

good day,
Osman Baris Malcioglu
METU Physics
Ankara






Alinti Bo Peng <bopengchemist at gmail.com>

> Dear all,
>
> I have recently downloaded and installed PWscf 3.2.2. After installation,
> I am trying to
> run the examples. My cluster is a Linux PC cluster with MPI (mpich), use LSF
> system
> to manage the jobs. As the manual said (P. 20) "... if your machine does not
> support
> interactive use, you must run the commands specified below through the batch
> queueing
> system installed on that machine...."
>
> I have used a script to submit the job:
> --------------------------
> #!/bin/bash
> #BSUB -q demo
> #BSUB -J ex01
> #BSUB -R span[ptile=2]
> #BSUB -o %J.log
> #BSUB -a mpich_gm
> #BSUB -c 4800:00
> #BSUB -n 16
>
> pw.x < si.scf.cg.in > si.scf.cg.out
> --------------------------
> errors:
> ...
> <MPICH-GM> Error: Need to obtain the job magic number in GMPI_MAGIC !
> /nfs/s07r2p1/beauchemist/.lsbatch/1181009945.198656.shell: line 10: 27824
> Broken pipe       pw.x < si.scf.cg.in > si.scf.cg.out
>
> Then I change the script to:
> ---------------------------
> #!/bin/bash
> ...
> #BSUB -n 16
>
> mpirun -np 16 pw.x -npool 8 < si.scf.cg.in > si.scf.cg.out
>
> ---------------------------
> there is no error but the .out file is empty.
>
> When change it to:
> ---------------------------
> #!/bin/bash
> ...
> #BSUB -n 16
>
> mpirun.lsf pw.x < si.scf.cg.in > si.scf.cg.out
> ---------------------------
> error:
> 1 - MPI_COMM_RANK : Null communicator
> [1]  Aborting program !
> [1] Aborting program!
> 0 - MPI_COMM_RANK : Null communicator
> [0]  Aborting program !
> [0] Aborting program!
>
> I do not know what is the case,Any help is appreciated!
>
> PS: the followed is the summary of make.sys file (other variables are
> empty):
>
> .f90.o:
>        $(MPIF90) $(F90FLAGS) -c $<
>
> .f.o:
>        $(F77) $(FFLAGS) -c $<
>
> .c.o:
>        $(CC) $(CFLAGS)  -c $<
>
>
> DFLAGS         =  -D__INTEL -D__FFTW -D__USE_INTERNAL_FFTW -D__MPI -D__PARA
> FDFLAGS        = $(DFLAGS)
>
> IFLAGS         = -I../include
>
> MODFLAGS       = -I./  -I../Modules  -I../iotk/src \
>                 -I../PW  -I../PH  -I../CPV
>
> MPIF90         = mpif90
> CC             = icc
> F77            = ifort
>
> CPP            = cpp
> CPPFLAGS       = -P -traditional $(DFLAGS) $(IFLAGS)
>
> CFLAGS         = -O3 $(DFLAGS) $(IFLAGS)
> F90FLAGS       = $(FFLAGS) -nomodule -fpp $(FDFLAGS) $(IFLAGS) $(MODFLAGS)
> FFLAGS         = -O2 -tpp6 -assume byterecl
>
> FFLAGS_NOOPT   = -O0 -assume byterecl
>
> BLAS_LIBS      = -L/opt/intel/mkl70/lib/32 -lmkl_ia32 -lguide -lpthread
>
> LAPACK_LIBS    = -lmkl_lapack
>
> MPI_LIBS       = /usr/local/mpich/smp/intel32/ssh/lib/libmpichf90.a
>
> AR             = ar
> ARFLAGS        = ruv
> ARFLAGS_DYNAMIC= ruv
>
> RANLIB         = ranlib
>
> LIBOBJS        = ../flib/ptools.a ../flib/flib.a ../clib/clib.a
> ../iotk/src/libiotk.a
>
> LIBS           = $(LAPACK_LIBS) $(BLAS_LIBS) $(FFT_LIBS) $(MPI_LIBS)
> $(MASS_LIBS) $(PGPLOT_LIBS)
>






More information about the users mailing list