Redio: Accelerating Disk-Based Graph Processing by Reducing Disk I/Os

Chengwen Wu, Guangyan Zhang, Yang Wang, Xinyang Jiang, Weimin Zheng

Research output: Contribution to journalArticle

Abstract

Disk-based graph systems store part or all of graph data on external devices like hard drives or SSDs, achieving scalability without excessive hardware. However, massive expensive disk I/Os remain the major performance bottleneck of disk-based graph processing. In this paper, we propose Redio, a new approach to accelerating disk-based graph processing by reducing disk I/Os. First, Redio observes that it is feasible to accommodate all vertex states in main memory and this can eliminate almost all vertex-related disk I/Os. Second, Redio introduces a dynamic selective scheduling scheme to identify inactive edges in each iteration and skip them when and only when such skipping can bring performance benefit. To improve its effectiveness, Redioin corporates a compact edge storage to improve data locality and an indexed bitmap to minimize its memory and computation overheads. We have implemented a single-node prototype for Redio under the edge-centric computation model. Extensive experiments show that Redio consistently outperforms well-known edge-centric disk-based systems in all experiments, delivering an average speedup of 4.33 × on HDDs and 5.33 × on SSDs over the fastest among them (i.e., GridGraph). Experimental results also show that Redio delivers an average speedup of 3.13 × on HDDs and 1.28 × on SSDs over the fastest among representative vertex-centric disk-based systems (i.e., FlashGraph).

Original languageEnglish
Article number8489961
Pages (from-to)414-425
Number of pages12
JournalIEEE Transactions on Computers
Volume68
Issue number3
DOIs
Publication statusPublished - 2019 Mar 1

Fingerprint

Data storage equipment
Graph in graph theory
Processing
Scalability
Experiments
Scheduling
Hardware
Vertex of a graph
Speedup
Data Locality
Experiment
Eliminate
Prototype
Minimise
Iteration
Experimental Results

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

Wu, Chengwen ; Zhang, Guangyan ; Wang, Yang ; Jiang, Xinyang ; Zheng, Weimin. / Redio : Accelerating Disk-Based Graph Processing by Reducing Disk I/Os. In: IEEE Transactions on Computers. 2019 ; Vol. 68, No. 3. pp. 414-425.
@article{32123fcd835b4c84af5fe302ee8104dd,
title = "Redio: Accelerating Disk-Based Graph Processing by Reducing Disk I/Os",
abstract = "Disk-based graph systems store part or all of graph data on external devices like hard drives or SSDs, achieving scalability without excessive hardware. However, massive expensive disk I/Os remain the major performance bottleneck of disk-based graph processing. In this paper, we propose Redio, a new approach to accelerating disk-based graph processing by reducing disk I/Os. First, Redio observes that it is feasible to accommodate all vertex states in main memory and this can eliminate almost all vertex-related disk I/Os. Second, Redio introduces a dynamic selective scheduling scheme to identify inactive edges in each iteration and skip them when and only when such skipping can bring performance benefit. To improve its effectiveness, Redioin corporates a compact edge storage to improve data locality and an indexed bitmap to minimize its memory and computation overheads. We have implemented a single-node prototype for Redio under the edge-centric computation model. Extensive experiments show that Redio consistently outperforms well-known edge-centric disk-based systems in all experiments, delivering an average speedup of 4.33 × on HDDs and 5.33 × on SSDs over the fastest among them (i.e., GridGraph). Experimental results also show that Redio delivers an average speedup of 3.13 × on HDDs and 1.28 × on SSDs over the fastest among representative vertex-centric disk-based systems (i.e., FlashGraph).",
author = "Chengwen Wu and Guangyan Zhang and Yang Wang and Xinyang Jiang and Weimin Zheng",
year = "2019",
month = "3",
day = "1",
doi = "10.1109/TC.2018.2875458",
language = "English",
volume = "68",
pages = "414--425",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
publisher = "IEEE Computer Society",
number = "3",

}

Redio : Accelerating Disk-Based Graph Processing by Reducing Disk I/Os. / Wu, Chengwen; Zhang, Guangyan; Wang, Yang; Jiang, Xinyang; Zheng, Weimin.

In: IEEE Transactions on Computers, Vol. 68, No. 3, 8489961, 01.03.2019, p. 414-425.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Redio

T2 - Accelerating Disk-Based Graph Processing by Reducing Disk I/Os

AU - Wu, Chengwen

AU - Zhang, Guangyan

AU - Wang, Yang

AU - Jiang, Xinyang

AU - Zheng, Weimin

PY - 2019/3/1

Y1 - 2019/3/1

N2 - Disk-based graph systems store part or all of graph data on external devices like hard drives or SSDs, achieving scalability without excessive hardware. However, massive expensive disk I/Os remain the major performance bottleneck of disk-based graph processing. In this paper, we propose Redio, a new approach to accelerating disk-based graph processing by reducing disk I/Os. First, Redio observes that it is feasible to accommodate all vertex states in main memory and this can eliminate almost all vertex-related disk I/Os. Second, Redio introduces a dynamic selective scheduling scheme to identify inactive edges in each iteration and skip them when and only when such skipping can bring performance benefit. To improve its effectiveness, Redioin corporates a compact edge storage to improve data locality and an indexed bitmap to minimize its memory and computation overheads. We have implemented a single-node prototype for Redio under the edge-centric computation model. Extensive experiments show that Redio consistently outperforms well-known edge-centric disk-based systems in all experiments, delivering an average speedup of 4.33 × on HDDs and 5.33 × on SSDs over the fastest among them (i.e., GridGraph). Experimental results also show that Redio delivers an average speedup of 3.13 × on HDDs and 1.28 × on SSDs over the fastest among representative vertex-centric disk-based systems (i.e., FlashGraph).

AB - Disk-based graph systems store part or all of graph data on external devices like hard drives or SSDs, achieving scalability without excessive hardware. However, massive expensive disk I/Os remain the major performance bottleneck of disk-based graph processing. In this paper, we propose Redio, a new approach to accelerating disk-based graph processing by reducing disk I/Os. First, Redio observes that it is feasible to accommodate all vertex states in main memory and this can eliminate almost all vertex-related disk I/Os. Second, Redio introduces a dynamic selective scheduling scheme to identify inactive edges in each iteration and skip them when and only when such skipping can bring performance benefit. To improve its effectiveness, Redioin corporates a compact edge storage to improve data locality and an indexed bitmap to minimize its memory and computation overheads. We have implemented a single-node prototype for Redio under the edge-centric computation model. Extensive experiments show that Redio consistently outperforms well-known edge-centric disk-based systems in all experiments, delivering an average speedup of 4.33 × on HDDs and 5.33 × on SSDs over the fastest among them (i.e., GridGraph). Experimental results also show that Redio delivers an average speedup of 3.13 × on HDDs and 1.28 × on SSDs over the fastest among representative vertex-centric disk-based systems (i.e., FlashGraph).

UR - http://www.scopus.com/inward/record.url?scp=85054641333&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054641333&partnerID=8YFLogxK

U2 - 10.1109/TC.2018.2875458

DO - 10.1109/TC.2018.2875458

M3 - Article

AN - SCOPUS:85054641333

VL - 68

SP - 414

EP - 425

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 3

M1 - 8489961

ER -