Cancer ranks in the top of the cause of death in Taiwan for several years The cause of cancer is complicated and cancer susceptibility can be explained by genetics lifestyle and environmental component There are some genetic variants are related to cancer susceptibility In recent years the Next Generation Sequencing (NGS) technique is flourished that can help us rapidly and accurately getting the human Whole Genome Sequence (WGS) data Structural deletion variants are structural DNA variations that affect phenotypes via losing biology mechanism functions Therefore we can explore the cause of cancer from human genome and the difference between cancer patients and non-cancer people In our research structural deletion variants are associated with cancer susceptibility and which can be used to distinguish between cancer patients and non-cancer people We design a system aim to find out the germline structural deletions which can be used to distinguish between cancer or non-cancer samples and have the association with prognosis The system can be divided into three parts First of all in structural deletions detection part germline whole genome sequence data of cancer patients and non-cancer samples are the input to PopDel to detect the structural deletions from germline DNA sequence data After filtering structural deletions we get the higher coverage deletions Secondly in structural deletion selection part we utilize machine learning approach – attention weighted model to select cancer associated deletions Combining the gene expression profile from cancer tissue RNA-seq data and patient’s clinical information in order that find the immune and prognosis associated deletions Lastly in the prognosis part we apply survival-SVM to select the candidate deletions which are associated with recurrence and select the significant prognostic deletions from survival analysis We conduct our experiment in whole genome sequence data from NCKUH 192 cancer patients and Taiwan Biobank 499 non-cancer samples There are four different types of cancer in 192 cancer patients 8 breast cancer 120 colorectal cancer 29 endometrial cancer and 35 ovarian cancer Detect 14 772 deletions from PopDel and remain 2 919 after filtering the fewer coverage deletions Selecting 671 cancer associated deletions which are the highest weight from attention weighted model and we show that cancer and non-cancer sample can be separated Then we choose 160 immune associated deletions from immune correlation model representing that there are some structural deletions relate to immune gene expression At last we pick 65 candidate deletions which are correlated to recurrence We use Cox’s proportional model to select 8 prognostic deletions Prognostic deletions are correlated to some tumor maker genes in cancer tissue Among them there are 5 deletions are associated to better prognosis and another 3 deletions are associated with poor prognosis
Using Deletion Structural Variants and Immune Response Gene Expression Profiles to Predict Clinical Outcome of Cancer Patients
芷榕, 李. (Author). 2019
Student thesis: Doctoral Thesis