Despite an increasing consensus regarding the significance of properly identifying the most suitable clustering method for a given problem, a surprising amount of educational research, including both educational data mining (EDM) and learning analytics (LA), neglects this critical task. This shortcoming could in many cases have a negative impact on the prediction power of both the EDM and LA based approaches. To address such issues, this work proposes an evaluation approach that automatically compares several clustering methods using multiple internal and external performance measures on 9 real-world educational datasets of different sizes, created from the University of Tartu's Moodle system, to produce two-way clustering. Moreover, to investigate the possible effect of normalization on the performance of the clustering algorithms, this work performs the same experiment on a normalized version of the datasets. Since such an exhaustive evaluation includes multiple criteria, the proposed approach employs a multiple criteria decision-making method (i.e., TOPSIS) to rank the most suitable methods for each dataset. Our results reveal that the proposed approach can automatically compare the performance of the clustering methods and accordingly recommend the most suitable method for each dataset. Furthermore, our results show that in both normalized and nonnormalized datasets of different sizes with 10 features, DBSCAN and k-medoids are the best clustering methods, whereas agglomerative and spectral methods appear to be among the most stable and highly performing clustering methods for such datasets with 15 features. Regarding datasets with more than 15 features, OPTICS is among the top-ranked algorithms among the nonnormalized datasets, and k-medoids is the best among the normalized datasets. Interestingly, our findings reveal that normalization may have a negative effect on the performance of certain methods, e.g., spectral clustering and OPTICS; however, it appears to mostly have a positive impact on all of the other clustering methods.
All Science Journal Classification (ASJC) codes
- Computer Science(all)
- Materials Science(all)