[Pw_forum] PW 2.0 - problems with parallelization

Wed Mar 17 01:21:01 CET 2004

Dear PW users,

Here is the situation:
I have installed last week the new release of pwscf. Using the old 
configuration procedure ("./configure.old"), I was able to compile both 
in single-processor mode or in parallel. However, while the 
single-processor version seems to works, the parallel one fails even on 
simple things like the Si part of "example1"  from pw_examples (see 
input script in attachement). The self-consistent calculation works, but 
the non self-consistent one has a strange behavior :

- sometimes, pw just "hangs" without being killed, CPUs don't work 
anymore and pw stops writing in the output after the following lines:
[...]
     nbndx  =    32  nbnd   =     8  natomwfc =     8  npwx   =     168
     nelec  =    8.00 nkb   =     8  ngl    =      65

- other times (depending on the energy cutoff !), odd errors appear at 
the end of the non self-consistent output file, like:

[...]
     nbndx  =    32  nbnd   =     8  natomwfc =     8  npwx   =     168
     nelec  =    8.00 nkb   =     8  ngl    =      65

     The initial potential is read from file    silicon.pot
     Starting wfc are atomic

MPI_Recv: message truncated (rank 1, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
Rank (1, MPI_COMM_WORLD):  - MPI_Bcast()
Rank (1, MPI_COMM_WORLD):  - MPI_Allreduce()
Rank (1, MPI_COMM_WORLD):  - main()

and the output stops there.

(If the cutoff energy is put at 20 Ryd, the first case happens, whereas 
the second case happens for a cutoff of 18 Ryd)

The same example works normally if the non-sef consistent calculation is 
done in single-processor mode instead of parallel.

The operating system is Linux, running on a PC cluster (dual Xeon). The 
same kind of problem occurs when compiling  either with ifc 6.0 or ifc 
7.0, and using either mkl 5.1 or mkl 6.1. We also use mpif77 compiler 
and LAM 7.0.3/MPI 2 C++ for parallel implementation (older versions seem 
to give the same problem). The FFTW environment used is local.

Note that the old version of pw (1.3.1) worked perfectly well.

An additional fact (that may be of no relevance) is that the new 
configuration procedure ("./configure") is not able to detect the 
parallel environment, in particular cannot find "zggev", "dgemm" and 
"mpi_init" in the various libraries. Nevertheless, the old configuration 
procedure ("./configure.old") leads to compilation without any problem.

Thanks in advance,

Best regards,

Nicolas

PS: I also give the make file used (which compiles well) in attachement

-- 
Nicolas Mounet
Prof. Marzari's Group
Department of Materials Science and Engineering
13-4084              
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge MA 02139 
USA

Tel: (+1)617-253-6026

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: input.txt
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20040316/c67ba320/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: make.sys.txt
URL: <http://lists.quantum-espresso.org/pipermail/users/attachments/20040316/c67ba320/attachment-0001.txt>