Development of automatic breast cancer classification for mammography with convolutional neural network transfer learning and racially diverse datasets

論文翻譯標題: 利用遷移學習、卷積神經網路與具種族多樣性之影像資料集開發乳房X光攝影的全自動乳癌分類系統
  • 楊 濰澤

學生論文: Master's Thesis


Breast Cancer is the most common cancer for women In Taiwan and Asia countries the average age of diagnosis is 10 years younger than that of western countries To detect the breast cancer in the early stage mammography is the most widely used modality However such screening usually results in a high recall rate or a high false positive rate For decades to solve this problem researchers have tried to use image processing methods to build the CAD system Recently researchers have started to use methods of deep learning The research of deep learning has shown promising results to build classifiers for whole X-ray images Nonetheless experts generally have known the location of the lesion but it was difficult for them to diagnose some cases especially in BI-RADS 3 and 4 Therefore in our research we tried to build the system that was dependent on the manual ROI extraction rather than whole images Furthermore to examine whether the model can reduce the high recall rate we analyzed the performance of BI-RADS 3 and 4 Lastly we examined whether the model trained with western data could be applied to the Asian population We collected three public datasets from CBIS-DDSM BCDR and INbreast and one private dataset from NCKU Hospital To predict ROIs in various sizes we adopted the patch classification-based model and ROI pooling Patch classification-based model means the common convolutional neural networks was extended to process larger images Our patch classification-based model can process ROIs in four different sizes ROI pooling can process ROIs in any sizes To solve the problem of data imbalance we adjusted the data composition in one min-batch size and utilized the class weight In patch classification-based model our research result has shown the overall accuracy of 73 1% and achieved the AUC above 0 70 in any ROI sizes Compared with patch classification-based model ROI pooling only got the accuracy of 61% In addition our research found that when negative cases were 20 times more than positive cases the class weight only achieved the accuracy of 55% but the adjustment of data composition can achieve about 65% When we compared with human experts our experiment showed experts only possessed the accuracy of 50% in BI-RADS 3 and 4 but our models can maintain 67% Moreover our model can achieve the accuracy of 78% when it was applied to the dataset of NCKU Hospital Our research results have shown that deep learning had the potential to reduce the high recall rate in clinics Besides it has demonstrated that the model trained with western dataset seemed to be applicable to Asian population without any fine-tuning Although we still needed more clinical data to verify our results our proposed model has shown promising results in the reduction of recall rate and the application of the Asian population
監督員Yu-Hua Dean Fang (Supervisor)