TY - JOUR
T1 - Comparing machine learning with case-control models to identify confirmed dengue cases
AU - Ho, Tzong Shiann
AU - Weng, Ting Chia
AU - Wang, Jung Der
AU - Han, Hsieh Cheng
AU - Cheng, Hao Chien
AU - Yang, Chun Chieh
AU - Yu, Chih Hen
AU - Liu, Yen Jung
AU - Hu, Chien Hsiang
AU - Huang, Chun Yu
AU - Chen, Ming Hong
AU - King, Chwan Chuen
AU - Oyang, Yen Jen
AU - Liu, Ching Chuan
N1 - Funding Information:
The authors sincerely appreciate the financial support from the research grants of National Health Research Institutes (www.nhri.org. tw) (MR-108-GP-14 (CCK), NHRI-108A1-MRCO-0319191 (TSH)) and the Ministry of Science and Technology (www.most.gov.tw) (MOST-103-2314-B-006-009-MY3(TSH), MOST-107-2923-B-006-001(TSH), MOST-108-2923-B-006-001 (TSH)), Taiwan, which made this investigation possible. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2020 Ho et al.
PY - 2020/11
Y1 - 2020/11
N2 - In recent decades, the global incidence of dengue has increased. Affected countries have responded with more effective surveillance strategies to detect outbreaks early, monitor the trends, and implement prevention and control measures. We have applied newly developed machine learning approaches to identify laboratory-confirmed dengue cases from 4,894 emergency department patients with dengue-like illness (DLI) who received laboratory tests. Among them, 60.11% (2942 cases) were confirmed to have dengue. Using just four input variables [age, body temperature, white blood cells counts (WBCs) and platelets], not only the state-of-the-art deep neural network (DNN) prediction models but also the conven-tional decision tree (DT) and logistic regression (LR) models delivered performances with receiver operating characteristic (ROC) curves areas under curves (AUCs) of the ranging from 83.75% to 85.87% [for DT, DNN and LR: 84.60% ± 0.03%, 85.87% ± 0.54%, 83.75% ± 0.17%, respectively]. Subgroup analyses found all the models were very sensitive particu-larly in the pre-epidemic period. Pre-peak sensitivities (<35 weeks) were 92.6%, 92.9%, and 93.1% in DT, DNN, and LR respectively. Adjusted odds ratios examined with LR for low WBCs [≤ 3.2 (x103/μL)], fever (≤38˚C), low platelet counts [< 100 (x103/μL)], and elderly (≤ 65 years) were 5.17 [95% confidence interval (CI): 3.96–6.76], 3.17 [95%CI: 2.74–3.66], 3.10 [95%CI: 2.44–3.94], and 1.77 [95%CI: 1.50–2.10], respectively. Our prediction models can readily be used in resource-poor countries where viral/serologic tests are inconvenient and can also be applied for real-time syndromic surveillance to monitor trends of dengue cases and even be integrated with mosquito/environment surveillance for early warning and immediate prevention/control measures. In other words, a local community hospital/clinic with an instrument of complete blood counts (including platelets) can provide a sentinel screening during outbreaks. In conclusion, the machine learning approach can facilitate medical and public health efforts to minimize the health threat of dengue epidemics. How-ever, laboratory confirmation remains the primary goal of surveillance and outbreak investigation.
AB - In recent decades, the global incidence of dengue has increased. Affected countries have responded with more effective surveillance strategies to detect outbreaks early, monitor the trends, and implement prevention and control measures. We have applied newly developed machine learning approaches to identify laboratory-confirmed dengue cases from 4,894 emergency department patients with dengue-like illness (DLI) who received laboratory tests. Among them, 60.11% (2942 cases) were confirmed to have dengue. Using just four input variables [age, body temperature, white blood cells counts (WBCs) and platelets], not only the state-of-the-art deep neural network (DNN) prediction models but also the conven-tional decision tree (DT) and logistic regression (LR) models delivered performances with receiver operating characteristic (ROC) curves areas under curves (AUCs) of the ranging from 83.75% to 85.87% [for DT, DNN and LR: 84.60% ± 0.03%, 85.87% ± 0.54%, 83.75% ± 0.17%, respectively]. Subgroup analyses found all the models were very sensitive particu-larly in the pre-epidemic period. Pre-peak sensitivities (<35 weeks) were 92.6%, 92.9%, and 93.1% in DT, DNN, and LR respectively. Adjusted odds ratios examined with LR for low WBCs [≤ 3.2 (x103/μL)], fever (≤38˚C), low platelet counts [< 100 (x103/μL)], and elderly (≤ 65 years) were 5.17 [95% confidence interval (CI): 3.96–6.76], 3.17 [95%CI: 2.74–3.66], 3.10 [95%CI: 2.44–3.94], and 1.77 [95%CI: 1.50–2.10], respectively. Our prediction models can readily be used in resource-poor countries where viral/serologic tests are inconvenient and can also be applied for real-time syndromic surveillance to monitor trends of dengue cases and even be integrated with mosquito/environment surveillance for early warning and immediate prevention/control measures. In other words, a local community hospital/clinic with an instrument of complete blood counts (including platelets) can provide a sentinel screening during outbreaks. In conclusion, the machine learning approach can facilitate medical and public health efforts to minimize the health threat of dengue epidemics. How-ever, laboratory confirmation remains the primary goal of surveillance and outbreak investigation.
UR - http://www.scopus.com/inward/record.url?scp=85096029842&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096029842&partnerID=8YFLogxK
U2 - 10.1371/journal.pntd.0008843
DO - 10.1371/journal.pntd.0008843
M3 - Article
C2 - 33170848
AN - SCOPUS:85096029842
VL - 14
SP - 1
EP - 21
JO - PLoS Neglected Tropical Diseases
JF - PLoS Neglected Tropical Diseases
SN - 1935-2727
IS - 11
M1 - e0008843
ER -