Software is critical for Internet service availability since an Internet service may become unavailable due to software faults or software maintenance. In this paper, we propose a framework to allow zero-loss recovery and online maintenance for Internet services. The framework is based on the virtual machine (VM) technology and a connection migration technique called FT-TCP. It can recover transient application/operating system faults and it allows fault recovery and online maintenance on a single host. The framework substantially enhances FT-TCP so that it can be run efficiently in the VM environment. Specifically, we propose techniques to reduce the inter-VM switches and communication. Moreover, we propose service-specific optimizations to reduce the recovery time of FT-TCP. Finally, the framework provides an interface for the service designers to implement more service-specific optimizations. The framework was implemented by modifying an open source VM monitor, Xen, and the Linux kernel running on top of Xen. The effectiveness and efficiency of the framework were evaluated by running two Internet services, WWW proxy and FTP, on the proposed framework and measuring the impact on their performance. According to the experimental results, our approach causes only slight performance overhead (i.e. less than 4%) and memory overhead (i.e. less than 750 KB) for both the services.
All Science Journal Classification (ASJC) codes