[Pw_forum] Problem

Paolo Giannozzi giannozz at nest.sns.it
Thu Dec 9 10:30:43 CET 2004


On Wednesday 08 December 2004 18:38, Chao Cao wrote:

> I guess what you mean is the program just got stalled,
> without any error, output whatsoever, am I right?

the following items in the manual:
----------------------------------
- pw.x crashes with no error message at all.

This happens quite often in parallel execution, or under a batch
queue, or if you are writing the output to a file.
When the program crashes, part of the output, including the error
message, may be lost, or hidden into error files where nobody looks
into. It is the fault of the operating system, not of the code.
Try to run interactively and to write to the screen.
----------------------------------
[ it may also be useful to #define DEBUG in startup.f90: all processes 
  will write their output to file ] and:
----------------------------------
- pw.x runs but nothing happens.

Possible reasons:
 -in parallel execution, the code died on just one processor.
    Unpredictable behavior may follow.
 -in serial execution, the code encountered a floating-point error
    and goes on producing NaN's (Not a Number) forever unless
    exception handling is on (and usually it isn't).
----------------------------------
[ actually the second point may also happen in parallel execution ] 
apply to this case as well

> It seems that this problem is actually related with MPI, and, somehow 
> with the library you used. I tried to use a serial version of pw, and it
> runs  OK. Furthermore, if I abandon MKL, it also runs without stalled.

it looks like a problem with linear algebra routines

Paolo

-- 
Paolo Giannozzi             e-mail:  giannozz at nest.sns.it
Scuola Normale Superiore    Phone:   +39/050-509876, Fax:-563513 
Piazza dei Cavalieri 7      I-56126 Pisa, Italy



More information about the users mailing list