Conventional remote data access middlewares usually provide client applications with either a pre-staging scheme or an on-demand access scheme to fetch data. The pre-staging scheme uses parallel downloads to fetch a completed input file from multiple data sources, even when only a tiny file fragment is required. Such a transfer scheme consumes unnecessary data transmission time and storage space. In contrast, the on-demand scheme downloads only the required data blocks from a single data source and does not fully utilize the downstream bandwidth of the computing nodes. This paper presents a middleware called Spigot that facilitates legacy (grid-unaware) applications to transparently access remote data by using native I/O function calls. Spigot uses the on-demand concept to avoid unnecessary data transfer and adopts a co-allocation download algorithm to improve the data transfer performance. Moreover, it uses the pre-fetching strategy to reduce the data waiting time by overlapping data acquisition and data processing. It also provides the client application with its own user-level cache, which is advantageous since a larger cache space is available in comparison with the kernel-level cache. Further, it is easy to maintain data consistency between Spigot nodes. The experimental results indicate that Spigot achieves superior performance in reducing the data waiting time than the pre-staging and the on-demand access schemes.
All Science Journal Classification (ASJC) codes