[Pw_forum] error on XD1 platform
Sergey Lisenkov
proffess at yandex.ru
Thu Mar 15 20:13:30 CET 2007
Dear All,
I have a problem with running PW-3.2 on Cray XD1 platform. Sometimes it works, but very often it crashes. It doesn't matter which version of PGI compiler is used (6.1.1, 6.1.4 or 7.0-2). The code was linked with ACML library and FFTW from QE distribution.
The problem always happens after several ionic steps or during scf cyles. For example, I see the following in the output file:
.....
Writing output data file XXX.save
Process 0 lost connection: exiting
mpiexec: Error: read_rai_startup_ports: Failed to read barrier entry token from rank 1 process on c645n2.
Process 38 lost connection: exiting
ask 128 got 56 at line 863 in file /var/tmp/mpich-1.2.6/mpid/rai/raifma.cPProcess 16 lost connection: exiting
I tried several PW-versions including CVS one, but the same problem happens. Cray people don't think that it is MPICH problem.
Did anybody see this before? Googling didn't help too much.
Thanks,
Sergey
More information about the users
mailing list