Hepatitis C Virus Detection Model by Using Random Forest, Logistic-Regression and ABC Algorithm

Tzuu Hseng S. Li, Huan Jung Chiu, Ping Huan Kuo

研究成果: Article同行評審

16 引文 斯高帕斯(Scopus)

摘要

This study proposes an automatic classifier for detecting the multiclass probabilities of hepatitis C virus (HCV) incidence based on patients' blood attributes. The purpose of this study is to establish an artificial intelligence-based model that can identify HCV patients and detect the disease in early stage for future treatments. This model can be applied by using clinical data and keeps the performance from imbalanced datasets. The innovation in this article lies in considering the 'unbalanced data' existing in medical record-based clinical data. Synthetic minority oversampling technique (SMOTE) algorithm was further employed to derive corresponding solutions. This objective was achieved using a cascade two-stage method combining the random forest (RF) and logistic regression (LR) algorithms. Two models were trained by applying the RF (Model 1) and LR (Model 2) to raw and preprocessed data, respectively. The artificial bee colony (ABC) algorithm was then used to determine the optimal threshold value required for filtering and separation, that is, the optimal combination of both models. The two-stage mixing algorithm combines algorithms of different search dimensions, thus integrating the strengths of those algorithms. The critical threshold value for separating Model 1 and Model 2 was obtained through an optimized search using the ABC algorithm. After conducting 10-fold Monte Carlo cross-validation experiments 50 times (for mean values), data from the recent pandemic were used to verify the proposed method. To evaluate the quantitative results, indicators, such as prediction accuracy, precision, recall, F1-score, and Matthews correlation coefficient, were compared with those of the latest algorithms used in relevant fields. The results indicate that the proposed model, named Cascade RF-LR (with SMOTE), can be used to detect the multiclass probabilities of HCV incidence using the ABC algorithm, thereby improving the effectiveness of relevant treatments.

原文English
頁(從 - 到)91045-91058
頁數14
期刊IEEE Access
10
DOIs
出版狀態Published - 2022

All Science Journal Classification (ASJC) codes

  • 一般電腦科學
  • 一般材料科學
  • 一般工程
  • 電氣與電子工程

指紋

深入研究「Hepatitis C Virus Detection Model by Using Random Forest, Logistic-Regression and ABC Algorithm」主題。共同形成了獨特的指紋。

引用此