[Pw_forum] Problem
Paolo Giannozzi
giannozz at nest.sns.it
Thu Dec 9 19:50:55 CET 2004
On Thursday 09 December 2004 17:35, Chao Cao wrote:
> Thanks for reply. But does this part of the manual actually apply
> here? Coz the program actually didn't crash, it was running but
> doing weird stuff without any output.
I have occasionally seen this behavior in the past:
- when one MPI process encounters an error, while the others
don't. This was happening on the SP3 in Princeton using k-point
parallelization: if the diagonalization crashed on a specific k-point,
the code hung. It was a problem with diagonalization libraries
that has never been clarified.
- when one MPI process encounters a condition that forces one
process to follow a different flow from that of other processes.
For instance: one process thinks that he has converged, while
the others don't. This used to happen on the first T3D: processes
that were doing exactly the same calculation produced slightly
different answers. Of course it was a problem of the operating
system, not of the code.
The only way I know to track similar problems is to follow the
code flow and to put stops until you understand where and
why things go wrong. Unfortunately it is a time-consuming
process and the origin of the problem could be quite subtle,
or not even directly related to bugs in the code. Anyway: if
you have a test that hangs reliably, please submit it
Paolo
--
Paolo Giannozzi e-mail: giannozz at nest.sns.it
Scuola Normale Superiore Phone: +39/050-509876, Fax:-563513
Piazza dei Cavalieri 7 I-56126 Pisa, Italy
More information about the users
mailing list