We present a load target prediction scheme that mitigates the impact of load latency for modern microprocessors. The scheme uses a cache-like buffer to provide the base address, offset, and operand size at the instruction fetching stage of a pipeline so that a load target address can be computed earlier at the decode stage. With the dynamic use of a load stride, the scheme has achieved a prediction rate that is 15% higher than a previously proposed approach. By providing a 128-entry direct-mapped load-prediction buffer, two adders, and two forwarding paths, for a 4-fetch processor the scheme provides an average speedup of 10% to 32% in performance improvement as the data cache latency increases from 2 cycles to 4 cycles. A bit-array design that supports multiple-cast writes and eliminates associative logic commonly used in base register caching is developed for the prediction scheme.
|Number of pages||7|
|Journal||Proceedings of the Annual International Symposium on Microarchitecture|
|Publication status||Published - 1997 Dec 1|
|Event||Proceedings of the 1997 30th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-30 - Triangle Park, NC, USA|
Duration: 1997 Dec 1 → 1997 Dec 3
All Science Journal Classification (ASJC) codes
- Hardware and Architecture