TY - GEN
T1 - Application cluster service scheme for near-zero-downtime services
AU - Cheng, Fan-Tien
AU - Wu, Shang Lun
AU - Tsai, Ping Yen
AU - Chung, Yun Ta
AU - Yang, Haw Ching
PY - 2005/12/1
Y1 - 2005/12/1
N2 - The required reliability in applications of a distributed computer system is continuous service for 24 hours a day, 7 days a week. However, computer failures due to exhaustion of operating system resources, data corruption, numerical error accumulation, and so on, may interrupt services and cause significant losses. Hence, this work proposes an application cluster service (APCS) scheme. The proposed APCS provides both a failover scheme and a state recovery scheme for failure management. The failover scheme is designed mainly to automatically activate the backup application for replacing the failed application whenever it is sick or down. Meanwhile, the state recovery scheme is intended primarily to provide an inheritable design pattern to support applications with state recovery requirements. An application simply needs to inherit and implement this design pattern, and then can accomplish the task of state backup and recovery. Furthermore, a performance evaluator (PEV) that can detect performance degradation and predict time to failure is developed in this study. By using these detection and prediction capabilities, the APCS can perform the failover process before node breakdown. Thus, applying APCS and PEV can enable a distributed computer system to provide services with near-zero-downtime.
AB - The required reliability in applications of a distributed computer system is continuous service for 24 hours a day, 7 days a week. However, computer failures due to exhaustion of operating system resources, data corruption, numerical error accumulation, and so on, may interrupt services and cause significant losses. Hence, this work proposes an application cluster service (APCS) scheme. The proposed APCS provides both a failover scheme and a state recovery scheme for failure management. The failover scheme is designed mainly to automatically activate the backup application for replacing the failed application whenever it is sick or down. Meanwhile, the state recovery scheme is intended primarily to provide an inheritable design pattern to support applications with state recovery requirements. An application simply needs to inherit and implement this design pattern, and then can accomplish the task of state backup and recovery. Furthermore, a performance evaluator (PEV) that can detect performance degradation and predict time to failure is developed in this study. By using these detection and prediction capabilities, the APCS can perform the failover process before node breakdown. Thus, applying APCS and PEV can enable a distributed computer system to provide services with near-zero-downtime.
UR - http://www.scopus.com/inward/record.url?scp=33846136950&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33846136950&partnerID=8YFLogxK
U2 - 10.1109/ROBOT.2005.1570743
DO - 10.1109/ROBOT.2005.1570743
M3 - Conference contribution
AN - SCOPUS:33846136950
SN - 078038914X
SN - 9780780389144
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 4062
EP - 4067
BT - Proceedings of the 2005 IEEE International Conference on Robotics and Automation
T2 - 2005 IEEE International Conference on Robotics and Automation
Y2 - 18 April 2005 through 22 April 2005
ER -