Classification is an essential task for predicting the class values of new instances. Both k-fold and leave-one-out cross validation are very popular for evaluating the performance of classification algorithms. Many data mining literatures introduce the operations for these two kinds of cross validation and the statistical methods that can be used to analyze the resulting accuracies of algorithms, while those contents are generally not all consistent. Analysts can therefore be confused in performing a cross validation procedure. In this paper, the independence assumptions in cross validation are introduced, and the circumstances that satisfy the assumptions are also addressed. The independence assumptions are then used to derive the sampling distributions of the point estimators for k-fold and leave-one-out cross validation. The cross validation procedure to have such sampling distributions is discussed to provide new insights in evaluating the performance of classification algorithms.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence