Performance evaluation is key to the optimization of computer applications on multicore systems. While many techniques and profiling tools are available for measuring performance on homogeneous multicore platforms, most of them depend on the hardware support from the vendors. For developing applications on heterogeneous multicore systems, very few analysis tools exist to help the developers. This paper describes a software-based trace collection and performance analysis framework that can be ported to a variety of platforms via code instrumentation at the source level. A pure software profiling toolkit, called ParallelTracer, were implemented based on ANTLR, an open source parser generator, to support this framework. In this paper, we present our framework and toolkit. We use the IBM Cell processor as a case study to demonstrate the capability of ParallelTrace. Our results show that ParallelTracer provided useful information for programmers to understand program behaviors and identify potential performance bottlenecks via graphical visualization. We also discuss the runtime overhead of ParallelTracer. With proper usage, the performance and code size overhead introduced by our toolkit are limited around 19% to 5% and 9%, respectively, for the benchmark program in the case study.