[Pw_forum] MPI & disk unavailability

Gerardo Ballabio g.ballabio at cineca.it
Mon Nov 28 18:34:54 CET 2005


On 11/28/2005 06:01:43 PM, Konstantin Kudin wrote:
> I have the following question. Occasionally on the cluster I am  
> running Espresso (mostly cp.x) the storage becomes temporarily  
> unavailable.
> 
> This causes the jobs to die. It seems that while the output file  
> can sometimes wait for the storage, the restart files absolutely  
> cannot.

> So my question is if it is possible to make MPI wait for disk  
> availability when writing the restart files?

I suspect this is an operating system level thing. When the operating  
system "talks" to the disk and doesn't get a reply, it can either  
wait, or give up and report failure. Probably there is a configurable  
timeout. If it's a networked filesystem, most likely the filesystem  
daemons are responsible for that.

Actually, if this hasn't changed recently, Espresso doesn't use the  
MPI I/O functions: all I/O is handled by cpu 0 that reads and writes  
locally. The "local" disk may be (and most often is) a networked  
filesystem; but again, this is handled by the operating system, and  
completely transparent to Espresso.

I also guess that output behaves differently than input because it is  
buffered.

Gerardo




More information about the users mailing list