Adaptive checkpointing protocol to bound recovery time with message logging

Kuo-Feng Ssu, Bin Yao, W. Kent Fuchs

研究成果: Conference contribution

11 引文 斯高帕斯(Scopus)

摘要

Numerous mathematical approaches have been proposed to determine the optimal checkpoint interval for minimizing total execution time of an application in the presence of failures. These solutions are often not applicable due to the lack of accurate data on the probability distribution of failures. Most current checkpoint libraries require application users to define a fixed time interval for checkpointing. The checkpoint interval usually implies the approximate maximum recovery time for single process applications. However, actual recovery time can be much smaller when message logging is used. Due to this faster recovery, checkpointing may be more frequent than needed and thus unnecessary execution overhead is introduced. In this paper, an adaptive checkpointing protocol is developed to accurately enforce the user-defined recovery time and to reduce excessive checkpoints. An adaptive protocol has been implemented and evaluated using a receiver-based message logging algorithm on wired and wireless mobile networks. The results show that the protocol precisely maintains the user-defined maximum recovery times for several traces with varying message exchange rates. The mechanism incurs low overhead, avoids unnecessary checkpointing, and reduces failure free execution time.

原文English
主出版物標題Proceedings of the IEEE Symposium on Reliable Distributed Systems
發行者IEEE
頁面244-252
頁數9
ISBN(列印)0769502911
出版狀態Published - 1999 十二月 1
事件Proceedings of the 1999 18th IEEE Symposium on Reliable Distributed Systems (SRDS'99) - Lausanne, Switz
持續時間: 1999 十月 191999 十月 22

出版系列

名字Proceedings of the IEEE Symposium on Reliable Distributed Systems
ISSN(列印)1060-9857

Other

OtherProceedings of the 1999 18th IEEE Symposium on Reliable Distributed Systems (SRDS'99)
城市Lausanne, Switz
期間99-10-1999-10-22

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications

指紋 深入研究「Adaptive checkpointing protocol to bound recovery time with message logging」主題。共同形成了獨特的指紋。

引用此