Most of the real-world data that are analyzed using nonlinear classification techniques are imbalanced in terms of the proportion of examples available for each class. This problem of imbalanced class distributions can lead the algorithms to learn overly complex models that overfit the data and have little relevance. Our study analyzes different classification algorithms that were employed to predict the creditworthiness of a bank's customers based on checking account information. A series of experiments were conducted to test the different techniques. The objective is to determine a range of credit scores that could be implemented by a manager for risk management. As a result, by realizing the concept of classification with equal quantities, the implicit knowledge can be discovered successfully. Subsequently, a strategy of data cleaning for handling such a real case with imbalanced distribution data is then proposed.
All Science Journal Classification (ASJC) codes
- Economics, Econometrics and Finance(all)
- Computational Mathematics
- Applied Mathematics