TY - JOUR

T1 - Data partitioning for multicomputer database systems

T2 - A cell-based approach

AU - Hua, Kien A.

AU - Chiang, Lee

AU - Young, Honesty C.

PY - 1993/7

Y1 - 1993/7

N2 - This paper deals with load balancing in distributed memory parallel database computers. In such an environment, a relation is typically partitioned and distributed across a set of processing nodes. An efficient load balancing strategy, therefore, is the determining factor for the performance improvement of such systems. We introduce the notion of cell as the unit of data partition and data movement, and devise a greedy algorithm for the initial data distribution and an iterative method for subsequent rebalancing. A directory is used to map a cell to a processing node, thus, two logically consecutive cells can be assigned independently. We also develop an analytical model to compare our scheme to two known methods. The results indicate that the proposed technique exhibits significant rebalancing costs reduction (in terms of response time and throughput) over existing approaches. We show that, for a given set of system parameters, the maximum possible tuple distribution imbalance that the initial data distribution algorithm and the subsequent data rebalancing algorithm may cause. We then attain a mechanism to determine the appropriate number of cells that considers both the processing overhead and the maximum possible data partition imbalance.

AB - This paper deals with load balancing in distributed memory parallel database computers. In such an environment, a relation is typically partitioned and distributed across a set of processing nodes. An efficient load balancing strategy, therefore, is the determining factor for the performance improvement of such systems. We introduce the notion of cell as the unit of data partition and data movement, and devise a greedy algorithm for the initial data distribution and an iterative method for subsequent rebalancing. A directory is used to map a cell to a processing node, thus, two logically consecutive cells can be assigned independently. We also develop an analytical model to compare our scheme to two known methods. The results indicate that the proposed technique exhibits significant rebalancing costs reduction (in terms of response time and throughput) over existing approaches. We show that, for a given set of system parameters, the maximum possible tuple distribution imbalance that the initial data distribution algorithm and the subsequent data rebalancing algorithm may cause. We then attain a mechanism to determine the appropriate number of cells that considers both the processing overhead and the maximum possible data partition imbalance.

UR - http://www.scopus.com/inward/record.url?scp=0039434721&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0039434721&partnerID=8YFLogxK

U2 - 10.1016/0306-4379(93)90032-V

DO - 10.1016/0306-4379(93)90032-V

M3 - Article

AN - SCOPUS:0039434721

SN - 0306-4379

VL - 18

SP - 329

EP - 342

JO - Information Systems

JF - Information Systems

IS - 5

ER -