This study proposed a new processing method to predict breast cancer on the basis of nine individual attributes, including age, body mass index, glucose, insulin, and a homeostasis model assessment. First, principal component analysis (PCA) was used to identify valuable parts of the data and further reduce the dimensions of the data. The cumulative proportion of the top five major components was 99.89%. The multilayer perceptron network (MLP) method was then used to extract characteristics included in the data, and the structure of the network was designed for the exploration of how data developed as the dimensions increased or decreased. As such, the model was established to first explore (high dimensional) and then develop (low dimensional) data. After training and learning, the models could segregate the representative attributes and numbers, and the characteristic data were then used as classifiers through transfer learning techniques using support vector machines. To verify the proposed method, the experiment performed k-fold cross-validation 50 times on average. Experimental results verified the proposed method with 10-fold cross-validation using the dataset of Manuel Gomes from the University Hospital Centre of Coimbra, and an accuracy of 86.97% was achieved. The results indicate that the proposed series of processes and methods can effectively and powerfully examine the incidence of breast cancer. Furthermore, the data processed using only the PCA method as well as the characteristics extracted through the PCA method then combined with MLP after learning were analyzed. The differences displayed for the visual technique characteristics of the t-distributed stochastic neighbor embedding were compared.
All Science Journal Classification (ASJC) codes
- Computer Science(all)
- Materials Science(all)