[Pw_forum] ph.x v3.2 on NEC SX-8

Fri Dec 29 17:02:01 CET 2006

Thank Axel for fruitful discussion.

The larger nrxx index in NEC is come from the "good_fft_dimension"  
subroutine, and nrxx=25x24x37 (=22200), instead of 24x24x36 in INTEL. 
I found that I cannot simply neglect the "zero" elements after I tried two 
things:
1. masked the overflow error can pass through the phq_setup step, but 
after the whole phonon cycles, it hanged at some point;
2. modified the dmxc_spin subroutine and kept the extra array zero. The 
program can run till the end but the result was wrong.

Till now, I cannot find a way to solve the ph.x problem in NEC (pw.x
works). I will be appreciated if someone will address this issue.

Alternatively, I compiled espresso-2.1.4 in NEC using the following 
makefile, using the internal FFTW and __INTEL and __LINUX64 flags. On my 
test case, both pw.x and ph.x worked properly.
========================================================================
OSHOME=/nfs/home5/HLRS/xol/xolwlyim/PWSCF.2.1.4/espresso-2.1.4.parallel
#
# System-dependent definitions for NEC SX6 - Contributed by Guido Roma
# Edit according to your needs
#
# Precompiler:
#
MAKE=sxmake
CPP = /SX/usr/lib/sxcpp
INC_DIR = ../include

# For fft routines of ASL library
CPPFLAGS = -P -E -DLANGUAGE_FORTRAN -D__INTEL -D__LINUX64 -D__FFTW 
-D__USE_INTERNAL_FFTW -D__MPI -D__PARA -I$(INC_DIR)
# For libfft library, part of Mathkeisan Libraries 
#CPPFLAGS = -P -E -DLANGUAGE_FORTRAN -DHAS_ZHEGVX -D__SX6 -I$(INC_DIR) 
# For libjmfft library (www.idris.fr) by Jean-Marie Teuler
#CPPFLAGS = -P -E -DZZFFT3D=ccfft3d -DHAS_ZHEGVX -DLANGUAGE_FORTRAN 
-D__SX6 -I$(INC_DIR)

HOST=-sx8
BASIC=-float0 -P stack $(HOST)
MISC = -I$(INC_DIR) -eab -R5 -Wf" -P nh -ptr byte" -Wf,"-Ncont -A dbl4 "
MISC1 = -I$(INC_DIR) -eab -R5 -Wf" -P nh -ptr byte" -Wf,"-cont -A dbl4 "
PROF=-p
FTRACE=-ftrace
OPT= -C hopt -Wf" -pvctl noifopt loopcnt=9999999 expand=12 fullmsg 
vwork=stack -fusion -O noif"
OPTVSAFE= -C vsafe -Wf" -pvctl loopcnt=9999999 fullmsg vwork=stack "
OPT0= -C debug 
DEBUG= -g 
DEBUGOPT= -Wf" -init stack=zero heap=zero"

#

AR = sxar
ARFLAGS = rv

# This is needed to tell the compiler where modules are
#
MODULEFLAG= -I$(OSHOME)/Modules -I$(OSHOME)/PW -I$(OSHOME)/PH

#
# Fortran compiler:
#
#
F90 = sxmpif90
F77 = sxmpif90

FFLAGS = $(BASIC) $(MISC) $(OPT) $(DEBUGOPT)
#$(FTRACE)
#FFLAGS = $(BASIC) $(MISC) $(DEBUG) $(DEBUGOPT) $(OPT0) 
#FCAUTIOUS=$(BASIC) $(MISC1) $(DEBUG) $(DEBUGOPT)
F90FLAGS=$(FFLAGS)
F77FLAGS=-f0 $(FFLAGS)

#
# C compiler:
#
#CC = sxc++
#CFLAGS = -DLANGUAGE_C -DNEC -DSX -I$(INC_DIR) -hfloat0,0,acct
CC = sxcc
CCLOCAL=cc
CCFLAGS = -D__INTEL -D__LINUX64 -D__FFTW -D__USE_INTERNAL_FFTW -D__MPI 
-D__PARA -I$(INC_DIR)  

#
# Libraries:
#
# With ASL fft libraries
LIBS = -llapack -lblas 
# With libfft (Mathkeisan) libraries
# be careful, versions <= 1.4 are buggy (zzfft3d), 
#wait for 1.5 (expected end of 2003) 
#LIBS = -llapack -lblas $(OSHOME)/zzfft3d.o -lfft
# You can find the jmfft Cray compatible library written
# by Jean-Marie Teuler on www.idris.fr (search for jmfft)
#LIBS = -llapack -lblas -L$(HOME)/mylocal/lib -ljmfft

#
# Loader flags:
#
LD = $(F90)
#LDFLAGS =  $(BASIC) $(PROF) $(FTRACE) 
LDFLAGS = $(BASIC) $(DEBUG) $(DEBUGOPT) $(OSHOME)/flib/ptools.a \
          $(OSHOME)/flib/flib.a $(OSHOME)/clib/clib.a \
          -p -Wl" -f zero " $(LIBS)
RANLIB         = echo
=============================================================================

Thanks!

Best regards,
William

On Thu, 28 Dec 2006, Axel Kohlmeyer wrote:

> On 12/28/06, wlyim at puccini.che.pitt.edu <wlyim at puccini.che.pitt.edu> wrote:
> > Thanks for your suggestion. I will try one of the examples as soon as
> > possible.
> >
> > Current status: ifort-compiled pw.x and ph.x can complete the job
> > normally. However, the NEC executables pass a larger "nrxx" value, 22200
> > in NEC vs 20736 in Intel, given that nr1=24,nr2=24,nr3=36. So in NEC, some
> 
> that is very interesting.
> 
> > zero "zeta" were passed to dmxc_spin subroutine which led to "divide by
> > zero" error at line 1192 in Modules/functionals.f90. Interestingly, pw.x
> > by sxcross compiler and ifort gave the same scf results, while ph.x in NEC
> > didn't work...
> 
> no surprise here. pw.x does not need the derivatives of the exchange-
> correlation potential.
> 
> > Any suggestion is welcome, e.g. compiler options, preprocessor flags...
> 
> from looking at the code it seems that the relation nrxx=nrx1*nrx2*nrx3 is
> only true in the serial case. see Modules/fft_types.f90 lines 242ff.
> 
> the intel compiler code usually continues with a denormalized number
> (NaN or Inf) after a division by zero (same as IBM xlf) and since the
> corresponding grid point is not accessed this does not propagate.
> 
> to remedy the situation you can try a) compile a serial version of the code,
> b) look for a compiler flag to continue after a denormalized number, or
> c) correct the code in PH/phq_setup.f90 to call dmxc()/dmxc_spin()
> only for values or 'ir' that correspond to valid grid points.
> 
> cheers,
>     axel.
> 
> 
> >
> > Best regards,
> > William
> >
> 
> 
> 

-- 
Dr. Wai-Leung Yim
Institut fuer Reine und Angewandte Chemie,
Theoretische Chemie,
Carl von Ossiezky Universtaet Oldenburg,
26129 Oldenburg,
Germany
Email: wlyim at puccini.che.pitt.edu             
Phone:	+49-441-798-3950 (office)              
	+49-441-798-5102 (home)                
Fax:	+49-441-798-3964