[Pw_forum] MPI & disk unavailability

Axel Kohlmeyer akohlmey at vitae.cmm.upenn.edu
Mon Nov 28 18:15:56 CET 2005


On Mon, 28 Nov 2005, Konstantin Kudin wrote:

KK>  Hi,

kostya,

KK>  So my question is if it is possible to make MPI wait for disk
KK> availability when writing the restart files?

please note, that this has _nothing_ to do with MPI itself,
but with making the _application_ fault tolerant. while this
may be desirable for certain environments, it also tends to
add a _lot_ of clutter to the code, which makes maintaining
it a nightmare. the best compromise usually is to make sure,
that you can write intermediary restart files, optionally to
a special place and that you can set the frequency (according
to the stability of the machine).

while this is generally a problem of the setup of the respective
machine, and one usually cannot affort to make a code run on even
the most crappy hardware, one should have at least some helpful
options to have some kind of 'performance degraded mode' for
fragile machines, as most high-end hardware tends to be quite
fragile during its introduction.

KK>  2946  0.04081    0.0  418.3 -1164.45530 -1164.45530 -1163.99768
KK> -1163.50459  0.0000  0.0000 -0.0001  2.0386
KK> bm_list_5117:  p4_error: interrupt SIGx: 15
KK> rm_l_1_5140:  p4_error: interrupt SIGx: 15

so, just one of your processes died and MPICH
did not fully recognize it and thus fails on writing
to a half-closed tcp socket with a (to a regular
unix user) cryptic error message.

best regards,
    axel.

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.




More information about the users mailing list