[Pw_forum] problem about MPI
mbaris at metu.edu.tr
mbaris at metu.edu.tr
Tue Jun 5 07:46:53 CEST 2007
Dear Bo Peng,
doing a quick google dig about the problem, I come up with
http://www.myri.com/serve/cache/236.html
that is, your secondary script for generating a dynamic port number
used in mpi intercommunication is not working. This is most probably an
installation issue
regarding LSF, or do you have to call such a script yourself in the
submit batch file?
in the second case, are you sure you are using myrinet version of the
mpi? (I suppose you are on myrinet, as the first error suggests) And
besides that, it always is a very bad idea to try to bypass tight
integration of grid engines, often admins try to terminate runaway jobs
in nodes automatically. Also I am not sure you are bypassing it
correctly in that script, are you calling mpirun two times?
I think, this problem is not related by espresso, rather it is a LSF
issue on your site.
If you are looking for an alternative, may I suggest sun grid engine?
good day,
Osman Baris Malcioglu
METU Physics
Ankara
Alinti Bo Peng <bopengchemist at gmail.com>
> Dear all,
>
> I have recently downloaded and installed PWscf 3.2.2. After installation,
> I am trying to
> run the examples. My cluster is a Linux PC cluster with MPI (mpich), use LSF
> system
> to manage the jobs. As the manual said (P. 20) "... if your machine does not
> support
> interactive use, you must run the commands specified below through the batch
> queueing
> system installed on that machine...."
>
> I have used a script to submit the job:
> --------------------------
> #!/bin/bash
> #BSUB -q demo
> #BSUB -J ex01
> #BSUB -R span[ptile=2]
> #BSUB -o %J.log
> #BSUB -a mpich_gm
> #BSUB -c 4800:00
> #BSUB -n 16
>
> pw.x < si.scf.cg.in > si.scf.cg.out
> --------------------------
> errors:
> ...
> <MPICH-GM> Error: Need to obtain the job magic number in GMPI_MAGIC !
> /nfs/s07r2p1/beauchemist/.lsbatch/1181009945.198656.shell: line 10: 27824
> Broken pipe pw.x < si.scf.cg.in > si.scf.cg.out
>
> Then I change the script to:
> ---------------------------
> #!/bin/bash
> ...
> #BSUB -n 16
>
> mpirun -np 16 pw.x -npool 8 < si.scf.cg.in > si.scf.cg.out
>
> ---------------------------
> there is no error but the .out file is empty.
>
> When change it to:
> ---------------------------
> #!/bin/bash
> ...
> #BSUB -n 16
>
> mpirun.lsf pw.x < si.scf.cg.in > si.scf.cg.out
> ---------------------------
> error:
> 1 - MPI_COMM_RANK : Null communicator
> [1] Aborting program !
> [1] Aborting program!
> 0 - MPI_COMM_RANK : Null communicator
> [0] Aborting program !
> [0] Aborting program!
>
> I do not know what is the case,Any help is appreciated!
>
> PS: the followed is the summary of make.sys file (other variables are
> empty):
>
> .f90.o:
> $(MPIF90) $(F90FLAGS) -c $<
>
> .f.o:
> $(F77) $(FFLAGS) -c $<
>
> .c.o:
> $(CC) $(CFLAGS) -c $<
>
>
> DFLAGS = -D__INTEL -D__FFTW -D__USE_INTERNAL_FFTW -D__MPI -D__PARA
> FDFLAGS = $(DFLAGS)
>
> IFLAGS = -I../include
>
> MODFLAGS = -I./ -I../Modules -I../iotk/src \
> -I../PW -I../PH -I../CPV
>
> MPIF90 = mpif90
> CC = icc
> F77 = ifort
>
> CPP = cpp
> CPPFLAGS = -P -traditional $(DFLAGS) $(IFLAGS)
>
> CFLAGS = -O3 $(DFLAGS) $(IFLAGS)
> F90FLAGS = $(FFLAGS) -nomodule -fpp $(FDFLAGS) $(IFLAGS) $(MODFLAGS)
> FFLAGS = -O2 -tpp6 -assume byterecl
>
> FFLAGS_NOOPT = -O0 -assume byterecl
>
> BLAS_LIBS = -L/opt/intel/mkl70/lib/32 -lmkl_ia32 -lguide -lpthread
>
> LAPACK_LIBS = -lmkl_lapack
>
> MPI_LIBS = /usr/local/mpich/smp/intel32/ssh/lib/libmpichf90.a
>
> AR = ar
> ARFLAGS = ruv
> ARFLAGS_DYNAMIC= ruv
>
> RANLIB = ranlib
>
> LIBOBJS = ../flib/ptools.a ../flib/flib.a ../clib/clib.a
> ../iotk/src/libiotk.a
>
> LIBS = $(LAPACK_LIBS) $(BLAS_LIBS) $(FFT_LIBS) $(MPI_LIBS)
> $(MASS_LIBS) $(PGPLOT_LIBS)
>
More information about the users
mailing list