[Pw_forum] Problems with example05

Jörg Saßmannshausen sassmannshausen at tugraz.at
Fri Oct 12 21:19:22 CEST 2007


Dear all,

I have downloaded the latest (espresso-3.2.3) version of espresso and I am 
currently trying to install it on my AMD x2 machine, running Debian Etch 
(64bit). I am using the Intel Fortran Compiler (build 9.1 20060925) and gcc 
(build 4.1.2 20061115) for the compilation, mpich2-1.0.6 (build with above 
compilers) and acml3.6.0 as math library. For FFTW I am using fftw3.2 (build 
with above compilers). Compilation is going well (after I changed the 
make.sys so my libraries are acutally used) and the example jobs are running 
ok in seriel mode. However, in parallel mode some examples are stalling. Top 
shows me 100% cpu, with 2 cups being used. However, I notice a rather high 
load of sys (around 70%sy) and only the remaing part being user (around 
30%us). Following Axel Kohlmeyer's suggestion I tried:
- using OpenMPI instead of MPICH (did not solve the problem, but here I get 
100%us)
- changing the flags to O2 -unroll -tpp6 -pc64 (problem still persists)
- changing the flags to O2 -unroll -pc64 (same)

I have the same problems on my Intel P4 dual Xeon machine (using icc instead 
of gcc, also using ATLAS and the intern lapack, as the mkl was ok during 
compilation, but on execution of the programs produced: "undefined 
symbol: __intel_cpu_indicator" and google told me that the mkl is the culprit 
here. Note: ldd did not show any missing libraries).

As the run_example script does not really gives much insight of what is 
happening when the program is running, I did it sequentielly. 
Here is the way I started it and the output:
zeus:/usr/local/src/espresso-3.2.3/examples/example05/test# /opt/mpich2-1.0.5_icc_ifc/bin/mpiexec -n 
4 ../../../bin/pp.x  < si.pp_rho.in

     Program POST-PROC v.3.2.2  starts ...
     Today is 12Oct2007 at 20:40: 3

     Parallel version (MPI)

     Number of processors in use:       4
     R & G space division:  proc/pool =    4

     Planes per process (thick) : nr3 = 20 npp =   5 ncplane =  400

 Proc/  planes cols    G   planes cols    G    columns  G
 Pool       (dense grid)      (smooth grid)   (wavefct grid)
  1      5     63    683    5     63    683   22    135
  2      5     64    686    5     64    686   21    132
  3      5     63    682    5     63    682   21    132
  4      5     63    682    5     63    682   21    132
  0     20    253   2733   20    253   2733   85    531


     nbndx  =     4  nbnd   =     4  natomwfc =     8  npwx   =      90
     nelec  =   8.00  nkb   =     8  ngl    =      61

     Calling punch_plot, plot_num =   0
     Writing data to file  sicharge
     Reading header from file  sicharge
     Reading data from file  sicharge

     Writing data to be plotted to file si.rho.dat
and here the program simply stalls.

Running it with strace gives (last lines only)

recv(5, "     Writing data to be plotted "..., 1024, 0) = 51
write(1, "     Writing data to be plotted "..., 51     Writing data to be 
plotted to file si.rho.dat
) = 51
select(7, [4 5 6], [], [], {1, 0})      = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0})      = 0 (Timeout)

So I am stuck here; 2 machines, different CPU and versions of MPI (MPICH and 
OpenMPI), same problem. Also, some other example jobs show the same problem: 
ok in serial mode, stalling in parallel mode.

Has anybody a good idea? I somehow doubt that the problem is with MPICH as 
other programs are working and also OpenMPI produces the same problem, so I 
conclude it is not the underlying MPI. 

Any (useful) comments are apreciated.

Many thanks

Jörg


P.S. The same happens if I am using the internal BLAS/LAPACK :-(

-- 
*************************************************************
Jörg Saßmannshausen
Institut für chemische Technologie organischer Stoffe
TU-Graz
Stremayrgasse 16
8010 Graz
Austria

phone: +43 (0)316 873 8954
fax: +43 (0)316 873 4959
homepage: http://sassy.formativ.net/



More information about the users mailing list