Implementation and evaluation of directory hints in CC-NUMA multiprocessors

Hung-Chang Hsiao, Chung Ta King

Research output: Contribution to journalArticle

Abstract

Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) shared memory multiprocessor to keep track of where valid copies of a memory block may reside. With this information the node can fetch the block directly from those nodes on a read miss (RM). In this way the number of network transactions to serve the miss may be reduced and the expensive directory lookup operation may be removed from the critical path. In this paper, we discuss the issues involved in implementing the DH scheme on a CC-NUMA shared memory multiprocessor and examine one such implementation, which employs a small and fast cache to store the hints. Our simulation results show that the DH scheme can effectively reduce the read stall time. Also its performance is very competitive compared with a more expensive implementation which uses a large level-three cache. A drawback of the scheme is that it will introduce extra network traffic. We believe that the state-of-the-art interconnection networks, such as those built upon the SGI Spider [M. Galles, Scalable pipelined interconnect for distributed endpoint routing: the SGI SPIDER chip, in: Proc. Internat. Symp. on High Performance Interconnects (Hot Interconnects 4), 1996] and the Intel Cavallino [J. Carbonaro, F. Verhoorn, Cavallino: the TeraFlops router and Nic, in: Proc. Internat. Symp. on High Performance Interconnects (Hot Interconnects 4), 1996] chips, provide the opportunity to make the DH scheme feasible even with the slower network such as the one built by Myrinet switches (N.J. Boden et al., Myrinet: a gigabit-per-second local area network, in: IEEE Micro, 1995, p. 29).

Original languageEnglish
Pages (from-to)107-132
Number of pages26
JournalParallel Computing
Volume28
Issue number1
DOIs
Publication statusPublished - 2002 Jan 1

Fingerprint

Interconnect
Multiprocessor
Data storage equipment
Cache
Evaluation
Shared-memory multiprocessors
Chip
Vertex of a graph
High Performance
Spiders
Routers
Local area networks
Critical Path
Interconnection Networks
Network Traffic
Router
Switches
Transactions
Switch
Routing

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence

Cite this

@article{da52a7fcd6464bc0947870d6a13364e7,
title = "Implementation and evaluation of directory hints in CC-NUMA multiprocessors",
abstract = "Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) shared memory multiprocessor to keep track of where valid copies of a memory block may reside. With this information the node can fetch the block directly from those nodes on a read miss (RM). In this way the number of network transactions to serve the miss may be reduced and the expensive directory lookup operation may be removed from the critical path. In this paper, we discuss the issues involved in implementing the DH scheme on a CC-NUMA shared memory multiprocessor and examine one such implementation, which employs a small and fast cache to store the hints. Our simulation results show that the DH scheme can effectively reduce the read stall time. Also its performance is very competitive compared with a more expensive implementation which uses a large level-three cache. A drawback of the scheme is that it will introduce extra network traffic. We believe that the state-of-the-art interconnection networks, such as those built upon the SGI Spider [M. Galles, Scalable pipelined interconnect for distributed endpoint routing: the SGI SPIDER chip, in: Proc. Internat. Symp. on High Performance Interconnects (Hot Interconnects 4), 1996] and the Intel Cavallino [J. Carbonaro, F. Verhoorn, Cavallino: the TeraFlops router and Nic, in: Proc. Internat. Symp. on High Performance Interconnects (Hot Interconnects 4), 1996] chips, provide the opportunity to make the DH scheme feasible even with the slower network such as the one built by Myrinet switches (N.J. Boden et al., Myrinet: a gigabit-per-second local area network, in: IEEE Micro, 1995, p. 29).",
author = "Hung-Chang Hsiao and King, {Chung Ta}",
year = "2002",
month = "1",
day = "1",
doi = "10.1016/S0167-8191(01)00123-5",
language = "English",
volume = "28",
pages = "107--132",
journal = "Parallel Computing",
issn = "0167-8191",
publisher = "Elsevier",
number = "1",

}

Implementation and evaluation of directory hints in CC-NUMA multiprocessors. / Hsiao, Hung-Chang; King, Chung Ta.

In: Parallel Computing, Vol. 28, No. 1, 01.01.2002, p. 107-132.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Implementation and evaluation of directory hints in CC-NUMA multiprocessors

AU - Hsiao, Hung-Chang

AU - King, Chung Ta

PY - 2002/1/1

Y1 - 2002/1/1

N2 - Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) shared memory multiprocessor to keep track of where valid copies of a memory block may reside. With this information the node can fetch the block directly from those nodes on a read miss (RM). In this way the number of network transactions to serve the miss may be reduced and the expensive directory lookup operation may be removed from the critical path. In this paper, we discuss the issues involved in implementing the DH scheme on a CC-NUMA shared memory multiprocessor and examine one such implementation, which employs a small and fast cache to store the hints. Our simulation results show that the DH scheme can effectively reduce the read stall time. Also its performance is very competitive compared with a more expensive implementation which uses a large level-three cache. A drawback of the scheme is that it will introduce extra network traffic. We believe that the state-of-the-art interconnection networks, such as those built upon the SGI Spider [M. Galles, Scalable pipelined interconnect for distributed endpoint routing: the SGI SPIDER chip, in: Proc. Internat. Symp. on High Performance Interconnects (Hot Interconnects 4), 1996] and the Intel Cavallino [J. Carbonaro, F. Verhoorn, Cavallino: the TeraFlops router and Nic, in: Proc. Internat. Symp. on High Performance Interconnects (Hot Interconnects 4), 1996] chips, provide the opportunity to make the DH scheme feasible even with the slower network such as the one built by Myrinet switches (N.J. Boden et al., Myrinet: a gigabit-per-second local area network, in: IEEE Micro, 1995, p. 29).

AB - Directory hints (DHs) help a node in a cache coherent non-uniform memory (CC-NUMA) shared memory multiprocessor to keep track of where valid copies of a memory block may reside. With this information the node can fetch the block directly from those nodes on a read miss (RM). In this way the number of network transactions to serve the miss may be reduced and the expensive directory lookup operation may be removed from the critical path. In this paper, we discuss the issues involved in implementing the DH scheme on a CC-NUMA shared memory multiprocessor and examine one such implementation, which employs a small and fast cache to store the hints. Our simulation results show that the DH scheme can effectively reduce the read stall time. Also its performance is very competitive compared with a more expensive implementation which uses a large level-three cache. A drawback of the scheme is that it will introduce extra network traffic. We believe that the state-of-the-art interconnection networks, such as those built upon the SGI Spider [M. Galles, Scalable pipelined interconnect for distributed endpoint routing: the SGI SPIDER chip, in: Proc. Internat. Symp. on High Performance Interconnects (Hot Interconnects 4), 1996] and the Intel Cavallino [J. Carbonaro, F. Verhoorn, Cavallino: the TeraFlops router and Nic, in: Proc. Internat. Symp. on High Performance Interconnects (Hot Interconnects 4), 1996] chips, provide the opportunity to make the DH scheme feasible even with the slower network such as the one built by Myrinet switches (N.J. Boden et al., Myrinet: a gigabit-per-second local area network, in: IEEE Micro, 1995, p. 29).

UR - http://www.scopus.com/inward/record.url?scp=0036133210&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036133210&partnerID=8YFLogxK

U2 - 10.1016/S0167-8191(01)00123-5

DO - 10.1016/S0167-8191(01)00123-5

M3 - Article

VL - 28

SP - 107

EP - 132

JO - Parallel Computing

JF - Parallel Computing

SN - 0167-8191

IS - 1

ER -