[Pw_forum] MPI & disk unavailability

Konstantin Kudin konstantin_kudin at yahoo.com
Mon Nov 28 18:01:43 CET 2005


 Hi,

 I have the following question. Occasionally on the cluster I am
running Espresso (mostly cp.x) the storage becomes temporarily
unavailable.

 This causes the jobs to die. It seems that while the output file can
sometimes wait for the storage, the restart files absolutely cannot.

 For example, in a recent case I had only 1 job out of 7 survive disk
unavailability, and I think this is because in the meantime it did not
need to write a restart file. It seems like in this particular case the
output file did wait.

 So my question is if it is possible to make MPI wait for disk
availability when writing the restart files?

 Note that when disk becomes full, the condition is different, and the
jobs die like this:

 2946  0.04081    0.0  418.3 -1164.45530 -1164.45530 -1163.99768
-1163.50459  0.0000  0.0000 -0.0001  2.0386
bm_list_5117:  p4_error: interrupt SIGx: 15
rm_l_1_5140:  p4_error: interrupt SIGx: 15
rm_l_1_5140: (196778.906250) net_send: could not write to fd=8, errno =
32
p0_5095: (196779.187500) net_send: could not write to fd=7, errno = 32
p3_8888:  p4_error: interrupt SIGx: 13
p2_8887:  p4_error: interrupt SIGx: 13
p2_8887: (196782.304688) net_send: could not write to fd=8, errno = 32
p3_8888: (196782.304688) net_send: could not write to fd=8, errno = 32
p1_5118: (196782.933594) net_send: could not write to fd=8, errno = 32


 Kostya


	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com



More information about the users mailing list