AutoBind

Automatic extraction of protein-ligand-binding affinity data from biological literature

Tien-Hao Chang, Chao Hsuan Ke, Jung Hsin Lin, Jung-Hsien Chiang

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Motivation: Determination of the binding affinity of a protein-ligand complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection.Result: We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles.

Original languageEnglish
Article numberbts367
Pages (from-to)2162-2168
Number of pages7
JournalBioinformatics
Volume28
Issue number16
DOIs
Publication statusPublished - 2012 Aug 1

Fingerprint

Protein Binding
Affine transformation
Ligands
Proteins
Protein
Databases
Scoring
Information Storage and Retrieval
Information retrieval
Information Retrieval
Molecules
Testing
Unknown
Target
Prediction

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

@article{412996d2d335485dae9139bdeb1dba28,
title = "AutoBind: Automatic extraction of protein-ligand-binding affinity data from biological literature",
abstract = "Motivation: Determination of the binding affinity of a protein-ligand complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection.Result: We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22{\%} precision and 79.07{\%} recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles.",
author = "Tien-Hao Chang and Ke, {Chao Hsuan} and Lin, {Jung Hsin} and Jung-Hsien Chiang",
year = "2012",
month = "8",
day = "1",
doi = "10.1093/bioinformatics/bts367",
language = "English",
volume = "28",
pages = "2162--2168",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "16",

}

AutoBind : Automatic extraction of protein-ligand-binding affinity data from biological literature. / Chang, Tien-Hao; Ke, Chao Hsuan; Lin, Jung Hsin; Chiang, Jung-Hsien.

In: Bioinformatics, Vol. 28, No. 16, bts367, 01.08.2012, p. 2162-2168.

Research output: Contribution to journalArticle

TY - JOUR

T1 - AutoBind

T2 - Automatic extraction of protein-ligand-binding affinity data from biological literature

AU - Chang, Tien-Hao

AU - Ke, Chao Hsuan

AU - Lin, Jung Hsin

AU - Chiang, Jung-Hsien

PY - 2012/8/1

Y1 - 2012/8/1

N2 - Motivation: Determination of the binding affinity of a protein-ligand complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection.Result: We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles.

AB - Motivation: Determination of the binding affinity of a protein-ligand complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection.Result: We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles.

UR - http://www.scopus.com/inward/record.url?scp=84865086069&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865086069&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bts367

DO - 10.1093/bioinformatics/bts367

M3 - Article

VL - 28

SP - 2162

EP - 2168

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 16

M1 - bts367

ER -