Improving the use of mortality data in public health: A comparison of garbage code redistribution models

Ta Chou Ng, Wei Cheng Lo, Chu Chang Ku, Tsung Hsueh Lu, Hsien Ho Lin

Research output: Contribution to journalArticle

Abstract

Objectives: To describe and compare 3 garbage code (GC) redistribution models: naïve Bayes classifier (NB), coarsened exact matching (CEM), and multinomial logistic regression (MLR). Methods: We analyzed Taiwan Vital Registration data (2008-2016) using a 2-step approach. First, we used non-GC death records to evaluate 3 different prediction models (NB, CEM, and MLR), incorporating individual-level information on multiple causes of death (MCDs) and demographic characteristics. Second, we applied the best-performing model to GC death records to predict the underlying causes of death. We conducted additional simulation analyses for evaluating the predictive performance of models. Results: When we did not account for MCDs, all 3 models presented high average misclassification rates in GC assignment (NB, 81%; CEM, 86%; MLR, 81%). In the presence of MCD information, NB and MLR exhibited significant improvement in assignment accuracy (19% and 17% misclassification rate, respectively). Furthermore, CEM without a variable selection procedure resulted in a substantially higher misclassification rate (40%). Conclusions: Comparing potential GC redistribution approaches provides guidance for obtaining better estimates of cause-of-death distribution and highlights the significance of MCD information for vital registration system reform.

Original languageEnglish
Pages (from-to)222-229
Number of pages8
JournalAmerican journal of public health
Volume110
Issue number2
DOIs
Publication statusPublished - 2020 Jan 1

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Public Health, Environmental and Occupational Health

Cite this