Designing a system usually acquires lots of instincts and knowledge to harmonize the computing resources under the paradigm of application specific heterogeneous systems. This paper presents a phase-based profiling mechanism to speed up the process of learning how application behaviors perform on the hardware and vice versa. By analyzing program phases, performance information can be gathered in a way that highlights the performance of high-level tasks in an application running on different hardware settings. We evaluated our phase-based profiling framework using QEMU, employing approximate timing models and mechanisms to track functions/events in programs and operating systems of the guest system. Furthermore, by using timing simulations, it is possible to escape the confined boundaries of real-world machine based systems, and to rapidly explore the impact of hardware parameters on the system performance. In our experimental results, phase-based profiling yields useful information of the runtime behaviors and performance of a program, allowing developers to discover program bottlenecks, and predicts the performance of optimization ideas on the software and/or underlying hardware. Our results suggest that incorporating phase profiling with the timing approximate simulator helps to facilitate hardware and software co-design.