Preaches – portable recovery and checkpointing in heterogeneous systems

Kuo-Feng Ssu, W. Kent Fuchs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Checkpointing in a homogeneous environment, where both checkpointing and recovery are performed on the same type of machine and operating system, has been studied extensively. As heterogeneous distributed systems become pervasive, it is desirable to extend the capability of checkpointing to non-homogeneous environments. This paper describes a prototype, PREACHES, that achieves portable checkpointing of single process applications in heterogeneous systems using checkpoint propagation. The checkpoint propagation technique generates machine-dependent checkpoints for each different architecture in the heterogeneous environment. When failure occurs, the failed process can be restarted on a specified machine with the checkpoint that is appropriate for the architecture. An implementation of PREACHES on a heterogeneous network of workstations has been successfully developed based on TCP/IP communication. PREACHES also provides automatic and fast recovery for single process programs.

Original languageEnglish
Title of host publicationDigest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-10
Number of pages10
ISBN (Electronic)0818684704, 9780818684708
DOIs
Publication statusPublished - 1998 Jan 1
Event28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998 - Munich, Germany
Duration: 1998 Jun 231998 Jun 25

Publication series

NameDigest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
Volume1998-January

Other

Other28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
CountryGermany
CityMunich
Period98-06-2398-06-25

Fingerprint

Checkpointing
Checkpoint
Heterogeneous Systems
Recovery
Computer workstations
Heterogeneous networks
Propagation
Network of Workstations
TCP/IP
Heterogeneous Environment
Communication
Heterogeneous Networks
Operating Systems
Distributed Systems
Prototype
Dependent
Architecture

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Software
  • Safety, Risk, Reliability and Quality
  • Modelling and Simulation

Cite this

Ssu, K-F., & Kent Fuchs, W. (1998). Preaches – portable recovery and checkpointing in heterogeneous systems. In Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998 (pp. 1-10). (Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998; Vol. 1998-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/FTCS.1998.689453
Ssu, Kuo-Feng ; Kent Fuchs, W. / Preaches – portable recovery and checkpointing in heterogeneous systems. Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998. Institute of Electrical and Electronics Engineers Inc., 1998. pp. 1-10 (Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998).
@inproceedings{1704a38276254d62abe3731cc03974fc,
title = "Preaches – portable recovery and checkpointing in heterogeneous systems",
abstract = "Checkpointing in a homogeneous environment, where both checkpointing and recovery are performed on the same type of machine and operating system, has been studied extensively. As heterogeneous distributed systems become pervasive, it is desirable to extend the capability of checkpointing to non-homogeneous environments. This paper describes a prototype, PREACHES, that achieves portable checkpointing of single process applications in heterogeneous systems using checkpoint propagation. The checkpoint propagation technique generates machine-dependent checkpoints for each different architecture in the heterogeneous environment. When failure occurs, the failed process can be restarted on a specified machine with the checkpoint that is appropriate for the architecture. An implementation of PREACHES on a heterogeneous network of workstations has been successfully developed based on TCP/IP communication. PREACHES also provides automatic and fast recovery for single process programs.",
author = "Kuo-Feng Ssu and {Kent Fuchs}, W.",
year = "1998",
month = "1",
day = "1",
doi = "10.1109/FTCS.1998.689453",
language = "English",
series = "Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1--10",
booktitle = "Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998",
address = "United States",

}

Ssu, K-F & Kent Fuchs, W 1998, Preaches – portable recovery and checkpointing in heterogeneous systems. in Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998. Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998, vol. 1998-January, Institute of Electrical and Electronics Engineers Inc., pp. 1-10, 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998, Munich, Germany, 98-06-23. https://doi.org/10.1109/FTCS.1998.689453

Preaches – portable recovery and checkpointing in heterogeneous systems. / Ssu, Kuo-Feng; Kent Fuchs, W.

Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998. Institute of Electrical and Electronics Engineers Inc., 1998. p. 1-10 (Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998; Vol. 1998-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Preaches – portable recovery and checkpointing in heterogeneous systems

AU - Ssu, Kuo-Feng

AU - Kent Fuchs, W.

PY - 1998/1/1

Y1 - 1998/1/1

N2 - Checkpointing in a homogeneous environment, where both checkpointing and recovery are performed on the same type of machine and operating system, has been studied extensively. As heterogeneous distributed systems become pervasive, it is desirable to extend the capability of checkpointing to non-homogeneous environments. This paper describes a prototype, PREACHES, that achieves portable checkpointing of single process applications in heterogeneous systems using checkpoint propagation. The checkpoint propagation technique generates machine-dependent checkpoints for each different architecture in the heterogeneous environment. When failure occurs, the failed process can be restarted on a specified machine with the checkpoint that is appropriate for the architecture. An implementation of PREACHES on a heterogeneous network of workstations has been successfully developed based on TCP/IP communication. PREACHES also provides automatic and fast recovery for single process programs.

AB - Checkpointing in a homogeneous environment, where both checkpointing and recovery are performed on the same type of machine and operating system, has been studied extensively. As heterogeneous distributed systems become pervasive, it is desirable to extend the capability of checkpointing to non-homogeneous environments. This paper describes a prototype, PREACHES, that achieves portable checkpointing of single process applications in heterogeneous systems using checkpoint propagation. The checkpoint propagation technique generates machine-dependent checkpoints for each different architecture in the heterogeneous environment. When failure occurs, the failed process can be restarted on a specified machine with the checkpoint that is appropriate for the architecture. An implementation of PREACHES on a heterogeneous network of workstations has been successfully developed based on TCP/IP communication. PREACHES also provides automatic and fast recovery for single process programs.

UR - http://www.scopus.com/inward/record.url?scp=85043584165&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85043584165&partnerID=8YFLogxK

U2 - 10.1109/FTCS.1998.689453

DO - 10.1109/FTCS.1998.689453

M3 - Conference contribution

T3 - Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998

SP - 1

EP - 10

BT - Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Ssu K-F, Kent Fuchs W. Preaches – portable recovery and checkpointing in heterogeneous systems. In Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998. Institute of Electrical and Electronics Engineers Inc. 1998. p. 1-10. (Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998). https://doi.org/10.1109/FTCS.1998.689453