TY - JOUR
T1 - Kernel support for zero-loss Internet service restart
AU - Chang, Da Wei
AU - Tsai, Chuan Ming
AU - Li, Wei Kou
AU - Lee, Tzu Rung
PY - 2007/7/10
Y1 - 2007/7/10
N2 - Owing to long serving time and huge numbers of clients, Internet services can easily suffer from transient faults. Although restarting a service can solve this problem, information of the on-line requests will be lost owing to the service restart, which is unacceptable for many commercial or transaction-based services. In this paper, we propose an approach to achieve the goal of zero-loss restart for Internet services. Under this approach, a kernel subsystem is responsible for detecting the transient faults, retaining the I/O channels of the service, and managing the service restart flow. In addition, some straightforward modifications to the service should be made to take advantage of the kernel support. To demonstrate the feasibility of our approach, we implemented the subsystem in the Linux kernel. Moreover, we modified a Web server and a CGI program to take advantage of the kernel support. According to the experimental results, our approach incurs little runtime overhead (i.e. less than 3.2%). When the service crashes, it can be restarted quickly (i.e. within 210 us) with no information loss. Furthermore, the performance impact due to the service crash is small. These results show that the approach can efficiently achieve the goal of zero-loss restart for Internet services.
AB - Owing to long serving time and huge numbers of clients, Internet services can easily suffer from transient faults. Although restarting a service can solve this problem, information of the on-line requests will be lost owing to the service restart, which is unacceptable for many commercial or transaction-based services. In this paper, we propose an approach to achieve the goal of zero-loss restart for Internet services. Under this approach, a kernel subsystem is responsible for detecting the transient faults, retaining the I/O channels of the service, and managing the service restart flow. In addition, some straightforward modifications to the service should be made to take advantage of the kernel support. To demonstrate the feasibility of our approach, we implemented the subsystem in the Linux kernel. Moreover, we modified a Web server and a CGI program to take advantage of the kernel support. According to the experimental results, our approach incurs little runtime overhead (i.e. less than 3.2%). When the service crashes, it can be restarted quickly (i.e. within 210 us) with no information loss. Furthermore, the performance impact due to the service crash is small. These results show that the approach can efficiently achieve the goal of zero-loss restart for Internet services.
UR - http://www.scopus.com/inward/record.url?scp=34347331367&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34347331367&partnerID=8YFLogxK
U2 - 10.1002/spe.787
DO - 10.1002/spe.787
M3 - Article
AN - SCOPUS:34347331367
SN - 0038-0644
VL - 37
SP - 833
EP - 855
JO - Software - Practice and Experience
JF - Software - Practice and Experience
IS - 8
ER -