Hadoop is a widely used software framework for handling massive data. As heterogeneous computing gains its momentum, variants of Hadoop have been developed to Offoad the computation of the Hadoop applications onto the heterogeneous processors, such as GPUS, DSPs, and FPGA. Unfortunately, these variants do not support on-demand resource scaling for the deadline-aware applications in a sophisticated heterogeneous computing environment. In this work, we developed a framework called Virtual Hadoop, which scales out the required computing resources for the applications automatically to meet the given real-time requirements. We extended the methods of resource inference and allocation for the heterogeneous computing environments. On top of these methods, an auto-scaling mechanism was developed to dynamically allocate resources on-demand based on profile data and performance models for the application execution to meet the given time requirements. In addition, Virtual Hadoop can utilize Docker containers to facilitate the auto-scaling mechanism, where a container encapsulates a Hadoop node with the capability to leverage heterogeneous computing engines. Our experimental results reveal the efficiency of Virtual Hadoop, and hopefully the experiences and discussion presented in this paper will ease the adoption of heterogeneous computing for Efficient big data processing. Copyright is held by the owner/author(s). Publication rights licensed to ACM.