TY - GEN
T1 - Preaches – portable recovery and checkpointing in heterogeneous systems
AU - Ssu, Kuo Feng
AU - Kent Fuchs, W.
N1 - Publisher Copyright:
© 1998 IEEE.
PY - 1998
Y1 - 1998
N2 - Checkpointing in a homogeneous environment, where both checkpointing and recovery are performed on the same type of machine and operating system, has been studied extensively. As heterogeneous distributed systems become pervasive, it is desirable to extend the capability of checkpointing to non-homogeneous environments. This paper describes a prototype, PREACHES, that achieves portable checkpointing of single process applications in heterogeneous systems using checkpoint propagation. The checkpoint propagation technique generates machine-dependent checkpoints for each different architecture in the heterogeneous environment. When failure occurs, the failed process can be restarted on a specified machine with the checkpoint that is appropriate for the architecture. An implementation of PREACHES on a heterogeneous network of workstations has been successfully developed based on TCP/IP communication. PREACHES also provides automatic and fast recovery for single process programs.
AB - Checkpointing in a homogeneous environment, where both checkpointing and recovery are performed on the same type of machine and operating system, has been studied extensively. As heterogeneous distributed systems become pervasive, it is desirable to extend the capability of checkpointing to non-homogeneous environments. This paper describes a prototype, PREACHES, that achieves portable checkpointing of single process applications in heterogeneous systems using checkpoint propagation. The checkpoint propagation technique generates machine-dependent checkpoints for each different architecture in the heterogeneous environment. When failure occurs, the failed process can be restarted on a specified machine with the checkpoint that is appropriate for the architecture. An implementation of PREACHES on a heterogeneous network of workstations has been successfully developed based on TCP/IP communication. PREACHES also provides automatic and fast recovery for single process programs.
UR - http://www.scopus.com/inward/record.url?scp=85043584165&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043584165&partnerID=8YFLogxK
U2 - 10.1109/FTCS.1998.689453
DO - 10.1109/FTCS.1998.689453
M3 - Conference contribution
AN - SCOPUS:85043584165
T3 - Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
SP - 38
EP - 47
BT - Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
Y2 - 23 June 1998 through 25 June 1998
ER -