[Pw_forum] Problem

Paolo Giannozzi giannozz at nest.sns.it
Thu Dec 9 19:50:55 CET 2004


On Thursday 09 December 2004 17:35, Chao Cao wrote:

> Thanks for reply. But does this part of the manual actually apply
> here? Coz the program actually didn't crash, it was running but 
> doing weird stuff without any output.

I have occasionally seen this behavior in the past:

- when one MPI process encounters an error, while the others 
  don't. This was happening on the SP3 in Princeton using k-point
  parallelization: if the diagonalization crashed on a specific k-point,
  the code hung. It was a problem with diagonalization libraries
  that has never been clarified.

- when one MPI process encounters a condition that forces one 
  process to follow a different flow from that of other processes.
  For instance: one process thinks that he has converged, while
  the others don't. This used to happen on the first T3D: processes
  that were doing exactly the same calculation produced slightly
  different answers. Of course it was a problem of the operating
  system, not of the code.

The only way I know to track similar problems is to follow the
code flow and to put stops until you understand where and 
why things go wrong. Unfortunately it is a time-consuming 
process and the origin of the problem could be quite subtle,
or not even directly related to bugs in the code. Anyway: if 
you have a test that hangs reliably, please submit it

Paolo

-- 
Paolo Giannozzi             e-mail:  giannozz at nest.sns.it
Scuola Normale Superiore    Phone:   +39/050-509876, Fax:-563513 
Piazza dei Cavalieri 7      I-56126 Pisa, Italy



More information about the users mailing list