Owing to long serving time and huge numbers of clients, Internet services can easily suffer from transient faults. Although restarting a service can solve this problem, information of the on-line requests will be lost owing to the service restart, which is unacceptable for many commercial or transaction-based services. In this paper, we propose an approach to achieve the goal of zero-loss restart for Internet services. Under this approach, a kernel subsystem is responsible for detecting the transient faults, retaining the I/O channels of the service, and managing the service restart flow. In addition, some straightforward modifications to the service should be made to take advantage of the kernel support. To demonstrate the feasibility of our approach, we implemented the subsystem in the Linux kernel. Moreover, we modified a Web server and a CGI program to take advantage of the kernel support. According to the experimental results, our approach incurs little runtime overhead (i.e. less than 3.2%). When the service crashes, it can be restarted quickly (i.e. within 210 us) with no information loss. Furthermore, the performance impact due to the service crash is small. These results show that the approach can efficiently achieve the goal of zero-loss restart for Internet services.
All Science Journal Classification (ASJC) codes