A self-adjusting data distribution mechanism for multidimensional load balancing in multiprocessor-based database systems

Chiang Lee, Kien A. Hua

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

With the advent of micro-processor, memory, and communication technology, it is economically feasible to develop a parallel database computer system to improve the performance of database systems. Relations in such an environment are usually partitioned and distributed across computing units. To achieve the optimal performance, it is essential for each unit to have a perfectly balanced load (i.e., identical amount of data). However, fragment sizes may vary due to insertions to and deletions from a relation. To retain good performance, the system needs to periodically rebalance the load of the processors by redistributing data among computing units. Traditionally, the redistribution is performed by reshuffling tuples among processors through a relation repartitioning (e.g., rehashing) process. The computation of this process is at the tuple level. In this paper, we present a self-adjusting data distribution scheme which balances computer workload at a cell (coarser grain than tuple) level during query processing to minimize redistribution cost. The entire scheme is built on top of the popular grid file structure. The adaptivity of the scheme and its relevant features are discussed. The cost of load rebalancing is estimated. The result shows that under our assumptions, it is always beneficial to rebalance computer workload before performing a join on skewed data.

Original languageEnglish
Pages (from-to)549-567
Number of pages19
JournalInformation Systems
Volume19
Issue number7
DOIs
Publication statusPublished - 1994 Jan 1

Fingerprint

Resource allocation
Query processing
Distributed computer systems
Costs
Computer systems
Data storage equipment
Communication

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this

@article{14f01ad2fd9144d9b66d281d660192fe,
title = "A self-adjusting data distribution mechanism for multidimensional load balancing in multiprocessor-based database systems",
abstract = "With the advent of micro-processor, memory, and communication technology, it is economically feasible to develop a parallel database computer system to improve the performance of database systems. Relations in such an environment are usually partitioned and distributed across computing units. To achieve the optimal performance, it is essential for each unit to have a perfectly balanced load (i.e., identical amount of data). However, fragment sizes may vary due to insertions to and deletions from a relation. To retain good performance, the system needs to periodically rebalance the load of the processors by redistributing data among computing units. Traditionally, the redistribution is performed by reshuffling tuples among processors through a relation repartitioning (e.g., rehashing) process. The computation of this process is at the tuple level. In this paper, we present a self-adjusting data distribution scheme which balances computer workload at a cell (coarser grain than tuple) level during query processing to minimize redistribution cost. The entire scheme is built on top of the popular grid file structure. The adaptivity of the scheme and its relevant features are discussed. The cost of load rebalancing is estimated. The result shows that under our assumptions, it is always beneficial to rebalance computer workload before performing a join on skewed data.",
author = "Chiang Lee and Hua, {Kien A.}",
year = "1994",
month = "1",
day = "1",
doi = "10.1016/0306-4379(94)90014-0",
language = "English",
volume = "19",
pages = "549--567",
journal = "Information Systems",
issn = "0306-4379",
publisher = "Elsevier Limited",
number = "7",

}

A self-adjusting data distribution mechanism for multidimensional load balancing in multiprocessor-based database systems. / Lee, Chiang; Hua, Kien A.

In: Information Systems, Vol. 19, No. 7, 01.01.1994, p. 549-567.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A self-adjusting data distribution mechanism for multidimensional load balancing in multiprocessor-based database systems

AU - Lee, Chiang

AU - Hua, Kien A.

PY - 1994/1/1

Y1 - 1994/1/1

N2 - With the advent of micro-processor, memory, and communication technology, it is economically feasible to develop a parallel database computer system to improve the performance of database systems. Relations in such an environment are usually partitioned and distributed across computing units. To achieve the optimal performance, it is essential for each unit to have a perfectly balanced load (i.e., identical amount of data). However, fragment sizes may vary due to insertions to and deletions from a relation. To retain good performance, the system needs to periodically rebalance the load of the processors by redistributing data among computing units. Traditionally, the redistribution is performed by reshuffling tuples among processors through a relation repartitioning (e.g., rehashing) process. The computation of this process is at the tuple level. In this paper, we present a self-adjusting data distribution scheme which balances computer workload at a cell (coarser grain than tuple) level during query processing to minimize redistribution cost. The entire scheme is built on top of the popular grid file structure. The adaptivity of the scheme and its relevant features are discussed. The cost of load rebalancing is estimated. The result shows that under our assumptions, it is always beneficial to rebalance computer workload before performing a join on skewed data.

AB - With the advent of micro-processor, memory, and communication technology, it is economically feasible to develop a parallel database computer system to improve the performance of database systems. Relations in such an environment are usually partitioned and distributed across computing units. To achieve the optimal performance, it is essential for each unit to have a perfectly balanced load (i.e., identical amount of data). However, fragment sizes may vary due to insertions to and deletions from a relation. To retain good performance, the system needs to periodically rebalance the load of the processors by redistributing data among computing units. Traditionally, the redistribution is performed by reshuffling tuples among processors through a relation repartitioning (e.g., rehashing) process. The computation of this process is at the tuple level. In this paper, we present a self-adjusting data distribution scheme which balances computer workload at a cell (coarser grain than tuple) level during query processing to minimize redistribution cost. The entire scheme is built on top of the popular grid file structure. The adaptivity of the scheme and its relevant features are discussed. The cost of load rebalancing is estimated. The result shows that under our assumptions, it is always beneficial to rebalance computer workload before performing a join on skewed data.

UR - http://www.scopus.com/inward/record.url?scp=0000549068&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0000549068&partnerID=8YFLogxK

U2 - 10.1016/0306-4379(94)90014-0

DO - 10.1016/0306-4379(94)90014-0

M3 - Article

VL - 19

SP - 549

EP - 567

JO - Information Systems

JF - Information Systems

SN - 0306-4379

IS - 7

ER -