[Pw_forum] error on XD1 platform
Andrea Ferretti
ferretti.andrea at unimore.it
Fri Mar 16 00:23:50 CET 2007
Hi all,
> Dear All,
>
> I have a problem with running PW-3.2 on Cray XD1 platform. Sometimes
it works,
> but very often it crashes. It doesn't matter which version of PGI
> compiler is used (6.1.1, 6.1.4 or 7.0-2). The code was linked with
> ACML library and FFTW from QE distribution.
>
I got problems which sound similar
running as well on a Cray XD1 machine (PGI compiler 6.0)...
I used both espresso-3.2 and the CVS version (around one
month ago) and while for some calculations they worked, for some others
(particularly heavy) sometimes they worked, sometimes not.
(and the crashes were either at the beginning of the calculation,
either after a number of scf iterations).
first, I discovered (or better, someone told me about) the existence in
espresso of a precompiler __XD1 flag which should be intended to fix some
troubles in the MPI stuff (troubles with PGI, I guess)...
I don't understand the errors you got, but since they seem to be
somehow related to MPI, I would try to add the flag
-D__XD1
to DFLAGS and FDFLAGS in the make.sys file
in my case however,
when I tried with __XD1 it didn't solve my problem...
but after some feedback from the
administrator, it seemed it was due to some instabilities of the
system (especially related to management of memory)
cheers
andrea
>
> Dear All,
>
> I have a problem with running PW-3.2 on Cray XD1 platform. Sometimes it works,
> but very often it crashes. It doesn't matter which version of PGI
> compiler is used (6.1.1, 6.1.4 or 7.0-2). The code was linked with
> ACML library and FFTW from QE distribution.
>
> The problem always happens after several ionic steps or during scf
> cyles. For example, I see the following in the output file:
>
> .....
> Writing output data file XXX.save
> Process 0 lost connection: exiting
> mpiexec: Error: read_rai_startup_ports: Failed to read barrier entry token from rank 1 process on c645n2.
>
> Process 38 lost connection: exiting
> ask 128 got 56 at line 863 in file /var/tmp/mpich-1.2.6/mpid/rai/raifma.cPProcess 16 lost connection: exiting
>
>
--
Andrea Ferretti
National Research Center S3, CNR-INFM ( http://s3.infm.it )
Dipartimento di Fisica, Universita' di Modena e Reggio Emilia
Via Campi 213/A I-41100 Modena, Italy
Tel: +39 059 2055301 Fax: +39 059 374794
Skype: andrea_ferretti
URL: http://www.nanoscience.unimo.it
Please, if possible, don't send me MS Word or PowerPoint attachments
Why? See: http://www.gnu.org/philosophy/no-word-attachments.html
More information about the users
mailing list