The evaluation measures of classification algorithms have always been an indispensable part of data mining and machine learning Some evaluation measures such as ROC is difficult to do an effective parametric statistical test for comparing different classification algorithms in terms of its’ original graphical characteristics Therefore this study discusses the common evaluation measures first under the imbalanced data among the widespread classification algorithms and explains the unique visual characteristics of the measure ROC Besides the study also explain that calculating the AUC (Area under the Curve of ROC) is also another numerical way allows users to evaluate and compare two classification algorithms However due to the population distribution of AUC in different classification algorithms’ performance are too hard to obtain users only could test algorithms’ classification performance through nonparametric statistical method But when it compares to nonparametric statistical method parametric statistical method could have more statistical power in hypothesis test As a result this paper introduces a parametric statistical method based on CLT (Central Limit Theorem) to evaluate two classification algorithms’ performance on imbalanced data In conclusion the experiment revealed that although the parametric statistical method could improve the statistical power in hypothesis test it is still no significant discriminate differences in hypothesis test between parametric and nonparametric statistical method when users compare two classification algorithms’ performance on imbalanced data
Date of Award | 2020 |
---|
Original language | English |
---|
Supervisor | Tzu-Tsung Wong (Supervisor) |
---|
Parametric Statistical Methods for Comparing the Performance of Classification Algorithms on Imbalanced Data by AUC Measure
柏傑, 王. (Author). 2020
Student thesis: Doctoral Thesis