[Pw_forum] MPI & disk unavailability
Konstantin Kudin
konstantin_kudin at yahoo.com
Mon Nov 28 18:01:43 CET 2005
Hi,
I have the following question. Occasionally on the cluster I am
running Espresso (mostly cp.x) the storage becomes temporarily
unavailable.
This causes the jobs to die. It seems that while the output file can
sometimes wait for the storage, the restart files absolutely cannot.
For example, in a recent case I had only 1 job out of 7 survive disk
unavailability, and I think this is because in the meantime it did not
need to write a restart file. It seems like in this particular case the
output file did wait.
So my question is if it is possible to make MPI wait for disk
availability when writing the restart files?
Note that when disk becomes full, the condition is different, and the
jobs die like this:
2946 0.04081 0.0 418.3 -1164.45530 -1164.45530 -1163.99768
-1163.50459 0.0000 0.0000 -0.0001 2.0386
bm_list_5117: p4_error: interrupt SIGx: 15
rm_l_1_5140: p4_error: interrupt SIGx: 15
rm_l_1_5140: (196778.906250) net_send: could not write to fd=8, errno =
32
p0_5095: (196779.187500) net_send: could not write to fd=7, errno = 32
p3_8888: p4_error: interrupt SIGx: 13
p2_8887: p4_error: interrupt SIGx: 13
p2_8887: (196782.304688) net_send: could not write to fd=8, errno = 32
p3_8888: (196782.304688) net_send: could not write to fd=8, errno = 32
p1_5118: (196782.933594) net_send: could not write to fd=8, errno = 32
Kostya
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
More information about the users
mailing list