Machine and network failures worsen the results of executing applications on system. Therefore, the reliability of applications on system is an important issue. The recovery of failed machines may increase processing time. This work studies the expected makespan to schedule tasks in heterogeneous distributed systems. A task may be replicated many times to reduce the expected execution time. A two-phase algorithm is proposed. The first phase uses a linear program formulation and a rounding procedure to obtain a favorable allotment to minimize the expected makespan. The second phase applies a scheduling method that is based on the expected executed time and the communication time. During execution, two strategies are considered. In the first strategy, no replication of a task on a set of processors can be stopped. In the second strategy, once a replication of a task has been completed, the other replications of the task are immediately aborted. A comparison reveals that the proposed algorithm significantly outperformed previously proposed algorithms in terms of schedule length ratio, reliability and speedup.
|Number of pages||12|
|Journal||IEEE Transactions on Parallel and Distributed Systems|
|Publication status||Published - 2016 Feb 1|
All Science Journal Classification (ASJC) codes
- Signal Processing
- Hardware and Architecture
- Computational Theory and Mathematics