TY - JOUR
T1 - Scalable dynamic instruction scheduler through wake-up spatial locality
AU - Chen, Chung Ho
AU - Hsiao, Kuo Su
N1 - Funding Information:
The authors thank all the reviewers for their helpful suggestions that strengthen the paper. Chia-Jung Hsu contributed in part of the simulations. This work was supported in part by the National Science Council, Taiwan, Grant NSC 94-2220-E-006-008.
PY - 2007/11
Y1 - 2007/11
N2 - In a high-performance superscalar processor, the instruction scheduler often comes with poor scalability and high complexity due to the expensive wakeup operation. From detailed simulation-based analyses, we find that 95% of the wakeup distances between two dependent instructions are short, in the range of 16 instructions, and 99% are in the range of 31 instructions. We apply this wakeup spatial locality to the design of conventional CAM-based and matrix-based wakeup logic respectively. By limiting the wakeup coverage to i + 16 instructions where 0 = ≤ i ≤ 15 for 16-entry segments, the proposed wakeup designs confine the wakeup operation in two matrix-based or three CAM-based 16-entry segments no matter how large the issue window size is. The experimental results show that for an issue window of 128 entries (IW128) or 256 entries (IW256), the proposed CAM-based wakeup locality design saves 65% (IW128) and 76% (IW256) of the power consumption, reduces 44% (IW128) and 78% (IW256) in the wakeup latency compared to the conventional CAM-based design with almost no performance loss. For the matrix-based wakeup logic, applying wakeup locality to the design drastically reduces the area cost. Extensive simulation results, including comparisons with previous works, show that the wakeup spatial locality is the key element to achieve scalability for future sophisticated instruction schedulers.
AB - In a high-performance superscalar processor, the instruction scheduler often comes with poor scalability and high complexity due to the expensive wakeup operation. From detailed simulation-based analyses, we find that 95% of the wakeup distances between two dependent instructions are short, in the range of 16 instructions, and 99% are in the range of 31 instructions. We apply this wakeup spatial locality to the design of conventional CAM-based and matrix-based wakeup logic respectively. By limiting the wakeup coverage to i + 16 instructions where 0 = ≤ i ≤ 15 for 16-entry segments, the proposed wakeup designs confine the wakeup operation in two matrix-based or three CAM-based 16-entry segments no matter how large the issue window size is. The experimental results show that for an issue window of 128 entries (IW128) or 256 entries (IW256), the proposed CAM-based wakeup locality design saves 65% (IW128) and 76% (IW256) of the power consumption, reduces 44% (IW128) and 78% (IW256) in the wakeup latency compared to the conventional CAM-based design with almost no performance loss. For the matrix-based wakeup logic, applying wakeup locality to the design drastically reduces the area cost. Extensive simulation results, including comparisons with previous works, show that the wakeup spatial locality is the key element to achieve scalability for future sophisticated instruction schedulers.
UR - http://www.scopus.com/inward/record.url?scp=35148817767&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35148817767&partnerID=8YFLogxK
U2 - 10.1109/TC.2007.70743
DO - 10.1109/TC.2007.70743
M3 - Article
AN - SCOPUS:35148817767
SN - 0018-9340
VL - 56
SP - 1534
EP - 1548
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 11
ER -