Preaches – portable recovery and checkpointing in heterogeneous systems

Kuo Feng Ssu, W. Kent Fuchs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Checkpointing in a homogeneous environment, where both checkpointing and recovery are performed on the same type of machine and operating system, has been studied extensively. As heterogeneous distributed systems become pervasive, it is desirable to extend the capability of checkpointing to non-homogeneous environments. This paper describes a prototype, PREACHES, that achieves portable checkpointing of single process applications in heterogeneous systems using checkpoint propagation. The checkpoint propagation technique generates machine-dependent checkpoints for each different architecture in the heterogeneous environment. When failure occurs, the failed process can be restarted on a specified machine with the checkpoint that is appropriate for the architecture. An implementation of PREACHES on a heterogeneous network of workstations has been successfully developed based on TCP/IP communication. PREACHES also provides automatic and fast recovery for single process programs.

Original languageEnglish
Title of host publicationDigest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages38-47
Number of pages10
ISBN (Electronic)0818684704, 9780818684708
DOIs
Publication statusPublished - 1998
Event28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998 - Munich, Germany
Duration: 1998 Jun 231998 Jun 25

Publication series

NameDigest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
Volume1998-January

Other

Other28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
Country/TerritoryGermany
CityMunich
Period98-06-2398-06-25

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Software
  • Safety, Risk, Reliability and Quality
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Preaches – portable recovery and checkpointing in heterogeneous systems'. Together they form a unique fingerprint.

Cite this