Clustering Algorithms in an Educational Context: An Automatic Comparative Approach

Danial Hooshyar, Yeongwook Yang, Margus Pedaste, Yueh Min Huang

Research output: Contribution to journalArticlepeer-review

Abstract

Despite an increasing consensus regarding the significance of properly identifying the most suitable clustering method for a given problem, a surprising amount of educational research, including both educational data mining (EDM) and learning analytics (LA), neglects this critical task. This shortcoming could in many cases have a negative impact on the prediction power of both the EDM and LA based approaches. To address such issues, this work proposes an evaluation approach that automatically compares several clustering methods using multiple internal and external performance measures on 9 real-world educational datasets of different sizes, created from the University of Tartu's Moodle system, to produce two-way clustering. Moreover, to investigate the possible effect of normalization on the performance of the clustering algorithms, this work performs the same experiment on a normalized version of the datasets. Since such an exhaustive evaluation includes multiple criteria, the proposed approach employs a multiple criteria decision-making method (i.e., TOPSIS) to rank the most suitable methods for each dataset. Our results reveal that the proposed approach can automatically compare the performance of the clustering methods and accordingly recommend the most suitable method for each dataset. Furthermore, our results show that in both normalized and nonnormalized datasets of different sizes with 10 features, DBSCAN and k-medoids are the best clustering methods, whereas agglomerative and spectral methods appear to be among the most stable and highly performing clustering methods for such datasets with 15 features. Regarding datasets with more than 15 features, OPTICS is among the top-ranked algorithms among the nonnormalized datasets, and k-medoids is the best among the normalized datasets. Interestingly, our findings reveal that normalization may have a negative effect on the performance of certain methods, e.g., spectral clustering and OPTICS; however, it appears to mostly have a positive impact on all of the other clustering methods.

Original languageEnglish
Article number9162041
Pages (from-to)146994-147014
Number of pages21
JournalIEEE Access
Volume8
DOIs
Publication statusPublished - 2020

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Fingerprint Dive into the research topics of 'Clustering Algorithms in an Educational Context: An Automatic Comparative Approach'. Together they form a unique fingerprint.

Cite this