[Q-e-developers] New restart mechanism and signal handlers

Lorenzo Paulatto lorenzo.paulatto at impmc.upmc.fr
Wed Jun 4 10:52:42 CEST 2014


Hello,
I agree with Paolo that restart from random crash is unpredictable and 
unmaintainable. However, to make everybody happy I have extended the 
current (normally disable) signal handling mechanism to intercept the 
signal that queues system normally send a few minutes before killing the 
job to trigger a clean exit.

Actually I had implemented this years ago, but I never uploaded it as it 
increases the likeliness that the code will be forcefully killed when 
it's writing the data!

In the current situation, being killed while writing the data has the 
same result (i.e. cannot restart) as being killed somewhere else, hence 
I see no problem in implementing this change.

I've also implemented clean exit when pressing CTRL-C, a double press of 
CTRL-C will still kill the code immediately.

I think that most queue systems send SIGTERM before SIGKILL, but some 
may also send SIGINT or SIGUSR*, if people can start to test the code we 
can easily add more signals.

I'll go forward and upload the change later, if there are no complaint, 
it is still disable by default. I think it would be sensible to enable 
it by default in parallel compilation.

cheers


p.s. I've also implemented the possibility to send the code in daemon 
mode when SIGHUP is received (e.g. ssh connection dies, pw.x keeps 
running in background) but it is fragile and not really useful, if you 
want it let me know.

-- 
Dr. Lorenzo Paulatto
IdR @ IMPMC -- CNRS & Université Paris 6
+33 (0)1 44 275 084 / skype: paulatz
http://www-int.impmc.upmc.fr/~paulatto/
23-24/4é16 Boîte courrier 115, 4 place Jussieu 75252 Paris Cédex 05




More information about the developers mailing list