TmVar: A text mining approach for extracting sequence variants in biomedical literature

Chih Hsuan Wei, Bethany R. Harris, Hung Yu Kao, Zhiyong Lu

研究成果: Article同行評審

96 引文 斯高帕斯(Scopus)

摘要

Motivation: Text-mining mutation information from the literature becomes a critical part of the bioinformatics approach for the analysis and interpretation of sequence variations in complex diseases in the post-genomic era. It has also been used for assisting the creation of disease-related mutation databases. Most of existing approaches are rule-based and focus on limited types of sequence variations, such as protein point mutations. Thus, extending their extraction scope requires significant manual efforts in examining new instances and developing corresponding rules. As such, new automatic approaches are greatly needed for extracting different kinds of mutations with high accuracy.Results: Here, we report tmVar, a text-mining approach based on conditional random field (CRF) for extracting a wide range of sequence variants described at protein, DNA and RNA levels according to a standard nomenclature developed by the Human Genome Variation Society. By doing so, we cover several important types of mutations that were not considered in past studies. Using a novel CRF label model and feature set, our method achieves higher performance than a state-of-the-art method on both our corpus (91.4 versus 78.1% in F-measure) and their own gold standard (93.9 versus 89.4% in F-measure). These results suggest that tmVar is a high-performance method for mutation extraction from biomedical literature.

原文English
頁(從 - 到)1433-1439
頁數7
期刊Bioinformatics
29
發行號11
DOIs
出版狀態Published - 2013 六月 1

All Science Journal Classification (ASJC) codes

  • 統計與概率
  • 生物化學
  • 分子生物學
  • 電腦科學應用
  • 計算機理論與數學
  • 計算數學

指紋

深入研究「TmVar: A text mining approach for extracting sequence variants in biomedical literature」主題。共同形成了獨特的指紋。

引用此