TY - GEN
T1 - A virtual timing device for program performance analysis
AU - Hsu, Wen Chang
AU - Hung, Shili Hao
AU - Tu, Chia Heng
PY - 2010
Y1 - 2010
N2 - Functional virtual platforms have been popularly used to support system development without needing the actual hardware. While the emulation process is fast enough to model the behaviors of complex systems, performance assessment cannot be done accurately due to the lack of timing models for the simulated systems. To tackle the problem, we proposed a virtual timing device (VTD) for a functional virtual platform to advance simulated clock time based on the hardware/software events observed during the emulation process. As a case study, we implemented the VTD in QEMU, an open-source virtual platform, with a variety of timing algorithms offering trade-offs between the accuracy and speed of timing estimation. With a fast, but less accurate timing algorithm, quick performance analysis can be done on QEMU at approximately 67 million instruction per second and reported execution time for the MiBench with an average of 15.7% error. Highly accurate performance profiles can be obtained by elaborating the timing model, e.g. urith the addition of cache simulation, at the cost of simulation speed.
AB - Functional virtual platforms have been popularly used to support system development without needing the actual hardware. While the emulation process is fast enough to model the behaviors of complex systems, performance assessment cannot be done accurately due to the lack of timing models for the simulated systems. To tackle the problem, we proposed a virtual timing device (VTD) for a functional virtual platform to advance simulated clock time based on the hardware/software events observed during the emulation process. As a case study, we implemented the VTD in QEMU, an open-source virtual platform, with a variety of timing algorithms offering trade-offs between the accuracy and speed of timing estimation. With a fast, but less accurate timing algorithm, quick performance analysis can be done on QEMU at approximately 67 million instruction per second and reported execution time for the MiBench with an average of 15.7% error. Highly accurate performance profiles can be obtained by elaborating the timing model, e.g. urith the addition of cache simulation, at the cost of simulation speed.
UR - https://www.scopus.com/pages/publications/78249286245
UR - https://www.scopus.com/pages/publications/78249286245#tab=citedBy
U2 - 10.1109/CIT.2010.389
DO - 10.1109/CIT.2010.389
M3 - Conference contribution
AN - SCOPUS:78249286245
SN - 9780769541082
T3 - Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010
SP - 2255
EP - 2260
BT - Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010
T2 - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010
Y2 - 29 June 2010 through 1 July 2010
ER -