Designing a modern memory hierarchy with hardware prefetching

Wei Fen Lin, Steven K. Reinhardt, Doug Burger Burger

研究成果: Article同行評審

31 引文 斯高帕斯(Scopus)


In this paper, we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that, even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-two cache, a processor still spends over half its time stalling for L2 misses. Our experimental analysis begins with an effort to tune our baseline memory system aggressively: incorporating optimizations to reduce DRAM row buffer misses, reordering miss accesses to reduce queuing delay, and adjusting the L2 block size to match each channel organization. We show that there is a large gap between the block sizes at which performance is best and at which miss rate is minimized. Using those results, we evaluate a hardware prefetch unit integrated with the L2 cache and memory controllers. By issuing prefetches only when the Rambus channels are idle, prioritizing them to maximize DRAM row buffer hits, and giving them low replacement priority, we achieve a 65 percent speedup across 10 of the 26 SPEC2000 benchmarks, without degrading the performance of the others. With eight Rambus channels, these 10 benchmarks improve to within 10 percent of the performance of a perfect L2 cache.

頁(從 - 到)1202-1218
期刊IEEE Transactions on Computers
出版狀態Published - 2001 十一月

All Science Journal Classification (ASJC) codes

  • 軟體
  • 理論電腦科學
  • 硬體和架構
  • 計算機理論與數學


深入研究「Designing a modern memory hierarchy with hardware prefetching」主題。共同形成了獨特的指紋。