TY - JOUR
T1 - TCAM-GNN
T2 - A TCAM-based Data Processing Strategy for GNN over Sparse Graphs
AU - Wang, Yu Pang
AU - Wang, Wei Chen
AU - Chang, Yuan Hao
AU - Tsai, Chieh Lin
AU - Kuo, Tei Wei
AU - Wu, Chun Feng
AU - Ho, Chien Chung
AU - Hu, Han Wen
N1 - Publisher Copyright:
IEEE
PY - 2023
Y1 - 2023
N2 - The graph neural network (GNN) has recently become an emerging research topic for processing non-euclidean data structures since the data used in various popular application domains are usually modeled as a graph, such as social networks, recommendation systems, and computer vision. Previous GNN accelerators commonly utilize the hybrid architecture to resolve the issue of “hybrid computing pattern” in GNN training. Nevertheless, the hybrid architecture suffers from poor utilization of hardware resources mainly due to the dynamic workloads between different phases in GNN. To address these issues, existing GNN accelerators adopt a unified structure with numerous processing elements and high bandwidth memory. However, the large amount of data movement between the processor and memory could heavily downgrade the performance of such accelerators in real-world graphs. As a result, the processing-in-memory architecture, such as the ReRAM-based crossbar, becomes a promising solution to reduce the memory overhead of GNN training. Furthermore, the ternary content addressable memory (TCAM) can parallel search the stored data with a given input, then output the matching result, which is a great choice to efficiently search the connected vertex through the edge list of the graph. In this work, we present the TCAM-GNN, a novel TCAM-based data processing strategy, to enable high-throughput and energy-efficient GNN training over ReRAM-based crossbar architecture. The proposed TCAM-based data processing approach smartly utilizes the TCAM crossbar to access and search graph data. Several hardware co-designed data structures and placement methods are proposed to fully exploit the parallelism in GNN during training. In addition, to resolve the precision issue, we propose a dynamic fixed-point formatting approach to enable GNN training over crossbar architecture. An adaptive data reusing policy is also proposed to enhance the data locality of graph features by the bootstrapping batch sampling approach. Overall, TCAM-GNN could enhance computing performance by 4.25× and energy efficiency by 9.11× on average compared to the neural network accelerators.
AB - The graph neural network (GNN) has recently become an emerging research topic for processing non-euclidean data structures since the data used in various popular application domains are usually modeled as a graph, such as social networks, recommendation systems, and computer vision. Previous GNN accelerators commonly utilize the hybrid architecture to resolve the issue of “hybrid computing pattern” in GNN training. Nevertheless, the hybrid architecture suffers from poor utilization of hardware resources mainly due to the dynamic workloads between different phases in GNN. To address these issues, existing GNN accelerators adopt a unified structure with numerous processing elements and high bandwidth memory. However, the large amount of data movement between the processor and memory could heavily downgrade the performance of such accelerators in real-world graphs. As a result, the processing-in-memory architecture, such as the ReRAM-based crossbar, becomes a promising solution to reduce the memory overhead of GNN training. Furthermore, the ternary content addressable memory (TCAM) can parallel search the stored data with a given input, then output the matching result, which is a great choice to efficiently search the connected vertex through the edge list of the graph. In this work, we present the TCAM-GNN, a novel TCAM-based data processing strategy, to enable high-throughput and energy-efficient GNN training over ReRAM-based crossbar architecture. The proposed TCAM-based data processing approach smartly utilizes the TCAM crossbar to access and search graph data. Several hardware co-designed data structures and placement methods are proposed to fully exploit the parallelism in GNN during training. In addition, to resolve the precision issue, we propose a dynamic fixed-point formatting approach to enable GNN training over crossbar architecture. An adaptive data reusing policy is also proposed to enhance the data locality of graph features by the bootstrapping batch sampling approach. Overall, TCAM-GNN could enhance computing performance by 4.25× and energy efficiency by 9.11× on average compared to the neural network accelerators.
UR - http://www.scopus.com/inward/record.url?scp=85181581812&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85181581812&partnerID=8YFLogxK
U2 - 10.1109/TETC.2023.3328008
DO - 10.1109/TETC.2023.3328008
M3 - Article
AN - SCOPUS:85181581812
SN - 2168-6750
SP - 1
EP - 14
JO - IEEE Transactions on Emerging Topics in Computing
JF - IEEE Transactions on Emerging Topics in Computing
ER -