[Q-e-developers] New restart mechanism and signal handlers
Lorenzo Paulatto
lorenzo.paulatto at impmc.upmc.fr
Wed Jun 4 10:52:42 CEST 2014
Hello,
I agree with Paolo that restart from random crash is unpredictable and
unmaintainable. However, to make everybody happy I have extended the
current (normally disable) signal handling mechanism to intercept the
signal that queues system normally send a few minutes before killing the
job to trigger a clean exit.
Actually I had implemented this years ago, but I never uploaded it as it
increases the likeliness that the code will be forcefully killed when
it's writing the data!
In the current situation, being killed while writing the data has the
same result (i.e. cannot restart) as being killed somewhere else, hence
I see no problem in implementing this change.
I've also implemented clean exit when pressing CTRL-C, a double press of
CTRL-C will still kill the code immediately.
I think that most queue systems send SIGTERM before SIGKILL, but some
may also send SIGINT or SIGUSR*, if people can start to test the code we
can easily add more signals.
I'll go forward and upload the change later, if there are no complaint,
it is still disable by default. I think it would be sensible to enable
it by default in parallel compilation.
cheers
p.s. I've also implemented the possibility to send the code in daemon
mode when SIGHUP is received (e.g. ssh connection dies, pw.x keeps
running in background) but it is fragile and not really useful, if you
want it let me know.
--
Dr. Lorenzo Paulatto
IdR @ IMPMC -- CNRS & Université Paris 6
+33 (0)1 44 275 084 / skype: paulatz
http://www-int.impmc.upmc.fr/~paulatto/
23-24/4é16 Boîte courrier 115, 4 place Jussieu 75252 Paris Cédex 05
More information about the developers
mailing list