Large-Scale Biomedical Literature Mining for Cross-Document Relation Extraction toward Drug Repurposing

  • 朱 浚煌

Student thesis: Doctoral Thesis

Abstract

Diseases are generally caused by the mutations of genes in human body For example the oncogene a normal gene that is abnormally mutated transforms normal cells into tumors On the other hand the mutation of a tumor suppressor gene a gene preventing a normal cell from being a tumor leads normal cells to dysfunction The role of a drug in disease treatment is basically to deal with the mutations of genes therefore; drugs genes and diseases are closely bound up with each other The drug development takes up to hundreds of millions of U S dollars and more than ten years to be on the market Plenty of time and cost can be reduced if we are able to reposition approved drugs by the use of existing resources that is drug repurposing Though drug repurposing could play an important role in the future drug development and disease therapeutics the complicated biology mechanism and the popular information technology lead to the rapid growth of the publicly-accessed biomedical resources Hence the objective of this research is to extract the relationships between drugs genes and diseases to further explore the new indications of drugs In this dissertation we proposed a novel method to identify protein-protein interactions through semantic similarity measures among protein mentions Moreover we shrunk a large volume of biomedical literature by a machine learning approach with features generated using information retrieval techniques to facilitate finding important documents Finally we utilized natural language processing methods for inferring indirect drug-disease relationships from large-scale biomedical literature and confirmed the suitability of drug candidates identified for repurposing as anticancer drugs by conducting a manual review of the literature and the clinical trials
Date of Award2015 Aug 19
Original languageEnglish
SupervisorJung-Hsien Chiang (Supervisor)

Cite this

'