This paper presents a methodology to the optimization in parallel join execution. The past researches on parallel join methods mostly focused on the design of algorithms for partitioning (e.g., hash) relations and distributing data buckets as evenly as possible to the processors. Once data are distributed to the processors, they assume that all processors will complete their tasks at about the same time. We stress that this is true if no further information such as page-level join index (to be discussed later) is available. Otherwise, the join execution can be further optimized and the workload in the processors may still be unbalanced. We study such problems that may incur in a shared-nothing architecture environment and propose algorithms to the problems in the paper. Also, a simulation study is performed to understand the characteristics of the proposed method.
|Number of pages||15|
|Journal||IEEE Transactions on Knowledge and Data Engineering|
|Publication status||Published - 1995 Dec|
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics