Application cluster service scheme for near-zero-downtime services

Fan-Tien Cheng, Shang Lun Wu, Ping Yen Tsai, Yun Ta Chung, Haw Ching Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

The required reliability in applications of a distributed computer system is continuous service for 24 hours a day, 7 days a week. However, computer failures due to exhaustion of operating system resources, data corruption, numerical error accumulation, and so on, may interrupt services and cause significant losses. Hence, this work proposes an application cluster service (APCS) scheme. The proposed APCS provides both a failover scheme and a state recovery scheme for failure management. The failover scheme is designed mainly to automatically activate the backup application for replacing the failed application whenever it is sick or down. Meanwhile, the state recovery scheme is intended primarily to provide an inheritable design pattern to support applications with state recovery requirements. An application simply needs to inherit and implement this design pattern, and then can accomplish the task of state backup and recovery. Furthermore, a performance evaluator (PEV) that can detect performance degradation and predict time to failure is developed in this study. By using these detection and prediction capabilities, the APCS can perform the failover process before node breakdown. Thus, applying APCS and PEV can enable a distributed computer system to provide services with near-zero-downtime.

Original languageEnglish
Title of host publicationProceedings of the 2005 IEEE International Conference on Robotics and Automation
Pages4062-4067
Number of pages6
DOIs
Publication statusPublished - 2005 Dec 1
Event2005 IEEE International Conference on Robotics and Automation - Barcelona, Spain
Duration: 2005 Apr 182005 Apr 22

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
Volume2005
ISSN (Print)1050-4729

Other

Other2005 IEEE International Conference on Robotics and Automation
Country/TerritorySpain
CityBarcelona
Period05-04-1805-04-22

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Application cluster service scheme for near-zero-downtime services'. Together they form a unique fingerprint.

Cite this