Heterogeneous multicore systems have become the trend for the embedded market due to its advantage of power/performance over traditional designs. However, monitoring the workload, scheduling the tasks, and managing the energy consumption to enable energy-efficient, real-time applications are increasingly challenging on such systems as both the applications and systems become complex. In this paper, we introduce a comprehensive approach to address the key problems and accelerate the design of a heterogeneous multicore embedded system by providing a suite of energy-aware system software with tightly coupled real-time support and performance/power modeling facilities. We start with a rapid full system modeling/simulation framework to characterize the application workload, design energy-saving algorithms, and verify if performance requirements are met by hardware specifications during the early design stage. With special considerations on today's multicore embedded systems, we developed several key components and integrated them as a system software suite: a portable, efficient library to support inter-core communications and multicore programming, a lightweight kernel for dynamically monitoring and sharing the workload among the processor cores, and a dynamic voltage and frequency scaling scheme to adjust the setting of the processors to save energy. The system software has been implemented on the PAC Duo system as a case study, with experimental results to demonstrate the effectiveness of the proposed approach. This paper discusses the novel techniques included in this system software and shares the lessons that we have learned from this work.