TY - JOUR
T1 - Developing a stroke severity index based on administrative data was feasible using data mining techniques
AU - Sung, Sheng Feng
AU - Hsieh, Cheng Yang
AU - Kao, Yea-Huei
AU - Lin, Huey Juan
AU - Chen, Chih-Hung
AU - Chen, Yu Wei
AU - Hu, Ya Han
PY - 2015/11/1
Y1 - 2015/11/1
N2 - Objectives Case-mix adjustment is difficult for stroke outcome studies using administrative data. However, relevant prescription, laboratory, procedure, and service claims might be surrogates for stroke severity. This study proposes a method for developing a stroke severity index (SSI) by using administrative data. Study Design and Setting We identified 3,577 patients with acute ischemic stroke from a hospital-based registry and analyzed claims data with plenty of features. Stroke severity was measured using the National Institutes of Health Stroke Scale (NIHSS). We used two data mining methods and conventional multiple linear regression (MLR) to develop prediction models, comparing the model performance according to the Pearson correlation coefficient between the SSI and the NIHSS. We validated these models in four independent cohorts by using hospital-based registry data linked to a nationwide administrative database. Results We identified seven predictive features and developed three models. The k-nearest neighbor model (correlation coefficient, 0.743; 95% confidence interval: 0.737, 0.749) performed slightly better than the MLR model (0.742; 0.736, 0.747), followed by the regression tree model (0.737; 0.731, 0.742). In the validation cohorts, the correlation coefficients were between 0.677 and 0.725 for all three models. Conclusion The claims-based SSI enables adjusting for disease severity in stroke studies using administrative data.
AB - Objectives Case-mix adjustment is difficult for stroke outcome studies using administrative data. However, relevant prescription, laboratory, procedure, and service claims might be surrogates for stroke severity. This study proposes a method for developing a stroke severity index (SSI) by using administrative data. Study Design and Setting We identified 3,577 patients with acute ischemic stroke from a hospital-based registry and analyzed claims data with plenty of features. Stroke severity was measured using the National Institutes of Health Stroke Scale (NIHSS). We used two data mining methods and conventional multiple linear regression (MLR) to develop prediction models, comparing the model performance according to the Pearson correlation coefficient between the SSI and the NIHSS. We validated these models in four independent cohorts by using hospital-based registry data linked to a nationwide administrative database. Results We identified seven predictive features and developed three models. The k-nearest neighbor model (correlation coefficient, 0.743; 95% confidence interval: 0.737, 0.749) performed slightly better than the MLR model (0.742; 0.736, 0.747), followed by the regression tree model (0.737; 0.731, 0.742). In the validation cohorts, the correlation coefficients were between 0.677 and 0.725 for all three models. Conclusion The claims-based SSI enables adjusting for disease severity in stroke studies using administrative data.
UR - http://www.scopus.com/inward/record.url?scp=84945475138&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84945475138&partnerID=8YFLogxK
U2 - 10.1016/j.jclinepi.2015.01.009
DO - 10.1016/j.jclinepi.2015.01.009
M3 - Article
C2 - 25700940
AN - SCOPUS:84945475138
VL - 68
SP - 1292
EP - 1300
JO - Journal of Clinical Epidemiology
JF - Journal of Clinical Epidemiology
SN - 0895-4356
IS - 11
ER -