[Pw_forum] Problems with example05
Jörg Saßmannshausen
sassmannshausen at tugraz.at
Fri Oct 12 21:19:22 CEST 2007
Dear all,
I have downloaded the latest (espresso-3.2.3) version of espresso and I am
currently trying to install it on my AMD x2 machine, running Debian Etch
(64bit). I am using the Intel Fortran Compiler (build 9.1 20060925) and gcc
(build 4.1.2 20061115) for the compilation, mpich2-1.0.6 (build with above
compilers) and acml3.6.0 as math library. For FFTW I am using fftw3.2 (build
with above compilers). Compilation is going well (after I changed the
make.sys so my libraries are acutally used) and the example jobs are running
ok in seriel mode. However, in parallel mode some examples are stalling. Top
shows me 100% cpu, with 2 cups being used. However, I notice a rather high
load of sys (around 70%sy) and only the remaing part being user (around
30%us). Following Axel Kohlmeyer's suggestion I tried:
- using OpenMPI instead of MPICH (did not solve the problem, but here I get
100%us)
- changing the flags to O2 -unroll -tpp6 -pc64 (problem still persists)
- changing the flags to O2 -unroll -pc64 (same)
I have the same problems on my Intel P4 dual Xeon machine (using icc instead
of gcc, also using ATLAS and the intern lapack, as the mkl was ok during
compilation, but on execution of the programs produced: "undefined
symbol: __intel_cpu_indicator" and google told me that the mkl is the culprit
here. Note: ldd did not show any missing libraries).
As the run_example script does not really gives much insight of what is
happening when the program is running, I did it sequentielly.
Here is the way I started it and the output:
zeus:/usr/local/src/espresso-3.2.3/examples/example05/test# /opt/mpich2-1.0.5_icc_ifc/bin/mpiexec -n
4 ../../../bin/pp.x < si.pp_rho.in
Program POST-PROC v.3.2.2 starts ...
Today is 12Oct2007 at 20:40: 3
Parallel version (MPI)
Number of processors in use: 4
R & G space division: proc/pool = 4
Planes per process (thick) : nr3 = 20 npp = 5 ncplane = 400
Proc/ planes cols G planes cols G columns G
Pool (dense grid) (smooth grid) (wavefct grid)
1 5 63 683 5 63 683 22 135
2 5 64 686 5 64 686 21 132
3 5 63 682 5 63 682 21 132
4 5 63 682 5 63 682 21 132
0 20 253 2733 20 253 2733 85 531
nbndx = 4 nbnd = 4 natomwfc = 8 npwx = 90
nelec = 8.00 nkb = 8 ngl = 61
Calling punch_plot, plot_num = 0
Writing data to file sicharge
Reading header from file sicharge
Reading data from file sicharge
Writing data to be plotted to file si.rho.dat
and here the program simply stalls.
Running it with strace gives (last lines only)
recv(5, " Writing data to be plotted "..., 1024, 0) = 51
write(1, " Writing data to be plotted "..., 51 Writing data to be
plotted to file si.rho.dat
) = 51
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
select(7, [4 5 6], [], [], {1, 0}) = 0 (Timeout)
So I am stuck here; 2 machines, different CPU and versions of MPI (MPICH and
OpenMPI), same problem. Also, some other example jobs show the same problem:
ok in serial mode, stalling in parallel mode.
Has anybody a good idea? I somehow doubt that the problem is with MPICH as
other programs are working and also OpenMPI produces the same problem, so I
conclude it is not the underlying MPI.
Any (useful) comments are apreciated.
Many thanks
Jörg
P.S. The same happens if I am using the internal BLAS/LAPACK :-(
--
*************************************************************
Jörg Saßmannshausen
Institut für chemische Technologie organischer Stoffe
TU-Graz
Stremayrgasse 16
8010 Graz
Austria
phone: +43 (0)316 873 8954
fax: +43 (0)316 873 4959
homepage: http://sassy.formativ.net/
More information about the users
mailing list