Understanding the program behavior and data dependen- cies are important when designing and accelerating applica- tions. However, conventional profiling tools are insuficient for tracking functions and loops of programs due to com- piler optimizations and probe effects. In order to minimize the probe effects, virtual platforms with timing simulation are used to monitor the profiled program and provide ex- ibility of evaluating the future platforms. Nevertheless, the profiling information is not collected in function- or loop- level for programmers to analyze and discover performance issues. This paper proposes a stack-pointer-based method with a later loop entry detection scheme to overcome the dificulties of detecting functions and loops for programs run- ning on a virtual platform. With the detailed performance counters and memory access patterns recorded along with the loop-call context tree, this paper also presents a frame- work collecting traces for detailed analysis on both of control ow and data ow of a program. The experimental results demonstrated the ability of the developed tool for collecting and profiling a program in a loop-call context tree form and for enabling further analysis on thread level parallelism and data dependency between functions and loops.