[Pw_forum] Fatal error in PMPI_Group_incl, possibly related to ScaLAPACK libraries

Paolo Giannozzi p.giannozzi at gmail.com
Tue Dec 6 17:19:07 CET 2016


I am not convinced that the problem you mention is the same as yours. In
order to figure out if the problem arises from Scalapack, you should remove
__SCALAPACK from DFLAGS and recompile: the code will use (much slower)
internal routines for parallel dense-matrix diagonalization. You may also
try to run with no dense-matrix diagonalization (-nd 1, not sure it is
honored though).

You should also report how your are running your code and, if using exotic
parallelizations like "band groups" (-nb N), check if the problem you have
is related to its usage

Paolo


On Thu, Dec 1, 2016 at 11:37 PM, Ryan Herchig <rch at mail.usf.edu> wrote:

> Hello all,
>
>     I am running pw.x in Quantum Espresso version 5.4.0, however if I try
> and run the job using more than 2 nodes with 8 cores each I receive the
> following error :
>
> Fatal error in PMPI_Group_incl: Invalid rank, error stack:
> PMPI_Group_incl(185).............: MPI_Group_incl(group=0x88000004, n=4,
> ranks=0x2852700, new_group=0x7fff57564668) failed
> MPIR_Group_check_valid_ranks(253): Invalid rank in rank array at index 3;
> value is 33 but must be in the range 0 to 31
>
> I am building/running on a local cluster maintained by the University I
> attend.  The specifications for the nodes are 2 x Intel Xeon E5-2670 (Eight
> Core) 32GB QDR InfiniBand. I found in a previous thread
>
> https://www.mail-archive.com/pw_forum@pwscf.org/msg27702.html
>
> involving espresso-5.3.0 where another user seemed to be experiencing the
> same issue where it was determined that "The problem is related to the
> obscure hacks needed to convince Scalapack to work in a subgroup of
> processors."  The suggestion in this post was to change a line in
> Modules/mp_global.f90 and recompile.  However I am running spin-collinear
> vdW-DF calculations which requires at least version 5.4.0 I believe and the
> lines in the subroutine found in mp_global.f90 has changed; furthermore
> following the suggestion of the previous post does not fix the issue.  It
> instead produces the following compilation error :
>
> mp_global.f90(97): error #6631: A non-optional actual argument must be
> present when invoking a procedure with an explicit interface.
> [NPARENT_COMM]
>     CALL mp_start_diag  ( ndiag_, intra_BGRP_comm )
> ---------^
> mp_global.f90(97): error #6631: A non-optional actual argument must be
> present when invoking a procedure with an explicit interface.
> [MY_PARENT_ID]
>     CALL mp_start_diag  ( ndiag_, intra_BGRP_comm )
> ---------^
> compilation aborted for mp_global.f90 (code 1)
>
>
> Does this problem with the ScaLAPACK libraries persist in the newer
> versions or could these errors have a separate origin?  Possibly something
> I am doing wrong during the build?  I have included the make.sys that I am
> using for "make pw" below.  If the error is due to the ScaLAPACK libraries,
> is there a workaround which could allow the use of additional processors
> when running calculations?  Thank you in advance.
>
>                            Thank you, Ryan Herchig
>
>                            University of South Florida, Department of
> Physics
>
>
> .SUFFIXES :
> .SUFFIXES : .o .c .f .f90
>
> .f90.o:
>     $(MPIF90) $(F90FLAGS) -c $<
>
> # .f.o and .c.o: do not modify
>
> .f.o:
>     $(F77) $(FFLAGS) -c $<
>
> .c.o:
>     $(CC) $(CFLAGS)  -c $<
>
> TOPDIR = /work/r/rch/espresso-5.4.0
>
> MANUAL_DFLAGS  =
> DFLAGS         =  -D__INTEL -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK
> FDFLAGS        = $(DFLAGS) $(MANUAL_DFLAGS)
>
> IFLAGS         = -I../include -I/apps/intel/2015/composer_
> xe_2015.3.187/mkl/include:/apps/intel/2015/composer_xe_
> 2015.3.187/tbb/include
>
> MOD_FLAG      = -I
>
> MPIF90         = mpif90
> #F90           = ifort
> CC             = icc
> F77            = ifort
>
> CPP            = cpp
> CPPFLAGS       = -P -C -traditional $(DFLAGS) $(IFLAGS)
>
> CFLAGS         = -O3 $(DFLAGS) $(IFLAGS)
> F90FLAGS       = $(FFLAGS) -nomodule -fpp $(FDFLAGS) $(IFLAGS) $(MODFLAGS)
> FFLAGS         = -O2 -assume byterecl -g -traceback
>
> FFLAGS_NOOPT   = -O0 -assume byterecl -g -traceback
>
> FFLAGS_NOMAIN   = -nofor_main
>
> LD             = mpif90
> LDFLAGS        =
> LD_LIBS        =
>
> BLAS_LIBS      = -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
> BLAS_LIBS_SWITCH = external
>
> LAPACK_LIBS    = -L/apps/intel/2015/composer_xe_2015.3.187/mkl/lib/intel64
> -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
> LAPACK_LIBS_SWITCH = external
>
> ELPA_LIBS_SWITCH = disabled
> SCALAPACK_LIBS = -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_ilp64
>
> FFT_LIBS       = -L/apps/intel/2015/composer_xe_2015.3.187/mkl/lib/intel64
> -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
>
> MPI_LIBS       =
>
> MASS_LIBS      =
>
> AR             = ar
> ARFLAGS        = ruv
>
> RANLIB         = ranlib
>
> FLIB_TARGETS   = all
>
> LIBOBJS        = ../clib/clib.a ../iotk/src/libiotk.a
> LIBS           = $(SCALAPACK_LIBS) $(LAPACK_LIBS) $(FFT_LIBS) $(BLAS_LIBS)
> $(MPI_LIBS) $(MASS_LIBS) $(LD_LIBS)
>
> WGET = wget -O
>
> PREFIX = /work/r/rch/espresso-5.4.0/EXE
>
>
>
>
>
>
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20161206/9e39873f/attachment.html>


More information about the users mailing list