A Study on Early Risk Assessment Techniques for Chronic Diseases by Mining Large-Scale Clinical Databases

  • 金 聚鈺

Student thesis: Doctoral Thesis


 In recent years the amount of electronic medical records (EMRs) has increased rapidly Hence obtaining valuable knowledge from EMRs to support medical decision making has become an important issue To address this issue in the thesis we propose a set of novel early risk assessment methods for different chronic diseases by identifying diverse disease risk factors from the National Health Insurance Research Database (NHIRD)  First we propose a Disease Risk Association Pattern Mining Framework (DR-APM) to detect early risk for chronic diseases and rheumatoid arthritis was used as a case study The main strategies of DR-APM include mining of disease risk pattern associative classification analysis with Risk Pattern Matching in PubMed (RPM-PubMed) and statistical analysis The RPM-PubMed experiments show that the risk patterns discovered through DR-APM can be organized into well-known risk pattern type and potential novel risk pattern type The experiments in statistical analysis reveal that there are significant differences in the disease categories of risk pattern distributions between the disease group and the control group Based on the significant differences DR-APM can achieve excellent accuracy in early risk assessment  Second in order to deal with the problem of a large number of disease coding attributes and the sparse matrix problem in EMR database we propose an early Disease Risk Assessment with the Matrix factorization method (eDRAM) that fuses machine learning and matrix factorization to identify latent risk factors from the EMR database eDRAM uses a non-negative matrix decomposition algorithm to significantly reduce the data dimension and reconstruct novel risk factors for early disease risk assessment The experiments demonstrate that eDRAM can reduce a large number of attributes and maintain better efficiency stability and effectiveness compared to other state-of-the-art methods  Finally in recent years deep learning can achieve excellent performance in features recognition However the computational time-consuming and resource-intensive problems exist in the training model phase especially dealing with large-scale attributes and data To solve these problems to assess different types of diseases and improve accuracy we propose an effective method called scalable Deep learning of Temporal generalized EHRs (sDT-EHRs) sDT-EHRs includes a novel temporal EHR representation model with an extraction algorithm a random sampling method and a deep residual convolutional neural network To evaluate the effectiveness of sDT-EHRs for early risk assessment of multiple diseases the following three chronic diseases: chronic obstructive pulmonary disease systemic lupus erythematosus and type 2 diabetes mellitus were assessed in the experiments and sDT-EHRs was compared with state-of-the-art methods for early risk assessment of three chronic diseases via a large-scale nationwide medical database Experimental evaluations of performance scalability and applied to multiple chronic diseases yielded major three findings First this proposed EHR representation model is a combination of generalized disease codes that increase efficiency during the training phase Second sDT-EHRs outperforms other state-of-the-art methods during the risk assessment of the three chronic diseases Finally sDT-EHRs demonstrates good scalability to assess the diseases risk based on the disease models constructed from relatively small amounts of patient data and to maintain high performance when evaluating a large number of patients  This research mainly considers the needs of modern precision medical treatment and systematically investigates and develops a set of early disease risk assessment frameworks based on the data mining machine learning and deep learning techniques In order to use real-world large-scale medical data for the early risk assessments of different chronic diseases we design a set of experiments to evaluate the improvement of the proposed method in terms of efficiency and effectiveness The main contribution of this study is to discover a variety of novel risk factors and improve the early risk assessment methods which can provide further medical validation analysis and assessment of different diseases to improve medical care
Date of Award2019
Original languageEnglish
SupervisorSun-Yuan Hsieh (Supervisor)

Cite this