Rapid advances in interconnection networks in multiprocessors are closing the gap between computation and communication. Given this trend, how can we utilize fast interconnects? This study proposes an enhanced CC-NUMA architecture, called Depot-NUMA, which views the congregation of the private caches in all nodes as a large remote access cache. Fast interconnects allow a missing block to be fetched from the private caches of other sharing nodes rather than from the home node. Issues involved in designing Depot-NUMA are also discussed, and a novel routing scheme, called multi-hop, is proposed to communicate between the potential sharers and fetch a missing block from their private caches. The sharers are specified based on a stride function to exploit network locality in the system. The proposed Depot-NUMA design requires only modest modification to the node controller and coherence protocol. Additionally, the interconnect fabric can be constructed using existing and unmodified commodity interconnects. Furthermore, the application-driven study reveals that Depot-Numa can reduce the read stall time by up to 41% and is competitive compared to a CC-NUMA with a large local cache.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Information Systems
- Hardware and Architecture