A parallel and distributed C4.5 algorithm in cloud computing environments

  • Kawuu W. Lin
  • , Ya Jun Zheng
  • , Ju Chin Chen
  • , Wei Chiang Wang
  • , Chao Chun Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Data mining seeks to derive significant insights from big data, offering valuable information for decision-makers and generating economic value. Among its tasks, classification is one of the most essential, with decision tree algorithms being a widely adopted solution due to their efficiency in addressing association rules and classification challenges. Decision trees are particularly advantageous for their ability to provide interpretable results at minimal computational cost. However, the exponential growth of data in the Internet era has highlighted the limitations of traditional algorithms, which struggle to efficiently process large-scale datasets.To address this issue, this study introduces PD-C4.5, a parallel and distributed implementation of the C4.5 algorithm designed for cloud computing environments. By incorporating a microservices architecture, the proposed approach modularizes computation, enabling flexible, scalable, and distributed execution. This design not only enhances system maintainability but also optimizes resource utilization while maintaining high computational efficiency.Experimental evaluations demonstrate that PD-C4.5 significantly outperforms Original C4.5 and MR-C4.5 in handling medium and large datasets, achieving notable reductions in computation time and resource consumption. Additionally, the integration of microservices ensures seamless scalability to accommodate increasing data volumes. This study provides a novel and practical solution for large-scale data classification, bridging the gap between computational efficiency and scalability in the era of big data.

Original languageEnglish
Article number68
JournalComputing
Volume107
Issue number2
DOIs
Publication statusPublished - 2025 Feb

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Numerical Analysis
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'A parallel and distributed C4.5 algorithm in cloud computing environments'. Together they form a unique fingerprint.

Cite this