The required availability in applications of a distributed computer system is continuous service for 24 hours a day, 7 days a week. However, computer failures due to exhaustion of operating system resources, data corruption, numerical error accumulation, and so on, may interrupt services and cause significant losses. Hence, this work proposes an application cluster service (APCS) scheme. The proposed APCS provides both a failover scheme and a state recovery scheme for failure management. The failover scheme is designed mainly to automatically activate the backup application to replace the failed application whenever it is sick or down. Meanwhile, the state recovery scheme is intended primarily to provide an inheritable software architecture scheme to support applications with state recovery requirements. An application simply needs to inherit and implement this scheme, and it then can accomplish the task of state backup and recovery. Furthermore, a performance evaluator (PEV) that may detect performance degradation and predict time to failure is developed in this study. By using these detection and prediction capabilities, the APCS can perform the failover process before node breakdown. Thus, applying APCS and PEV can enable an asynchronous distributed computer system with shared memory to provide services with near-zero-downtime.
|Number of pages||16|
|Journal||Journal of the Chinese Institute of Engineers, Transactions of the Chinese Institute of Engineers,Series A/Chung-kuo Kung Ch'eng Hsuch K'an|
|Publication status||Published - 2008 Jan 1|
All Science Journal Classification (ASJC) codes