TY - JOUR
T1 - AutoBind
T2 - Automatic extraction of protein-ligand-binding affinity data from biological literature
AU - Chang, Darby Tien Hao
AU - Ke, Chao Hsuan
AU - Lin, Jung Hsin
AU - Chiang, Jung Hsien
N1 - Funding Information:
Funding: This work was supported by National Science Council, Taiwan [NSC99-2221-E-006-127-MY3 and NSC100-2627-B-006-011].
PY - 2012/8
Y1 - 2012/8
N2 - Motivation: Determination of the binding affinity of a protein-ligand complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection.Result: We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles.
AB - Motivation: Determination of the binding affinity of a protein-ligand complex is important to quantitatively specify whether a particular small molecule will bind to the target protein. Besides, collection of comprehensive datasets for protein-ligand complexes and their corresponding binding affinities is crucial in developing accurate scoring functions for the prediction of the binding affinities of previously unknown protein-ligand complexes. In the past decades, several databases of protein-ligand-binding affinities have been created via visual extraction from literature. However, such approaches are time-consuming and most of these databases are updated only a few times per year. Hence, there is an immediate demand for an automatic extraction method with high precision for binding affinity collection.Result: We have created a new database of protein-ligand-binding affinity data, AutoBind, based on automatic information retrieval. We first compiled a collection of 1586 articles where the binding affinities have been marked manually. Based on this annotated collection, we designed four sentence patterns that are used to scan full-text articles as well as a scoring function to rank the sentences that match our patterns. The proposed sentence patterns can effectively identify the binding affinities in full-text articles. Our assessment shows that AutoBind achieved 84.22% precision and 79.07% recall on the testing corpus. Currently, 13 616 protein-ligand complexes and the corresponding binding affinities have been deposited in AutoBind from 17 221 articles.
UR - http://www.scopus.com/inward/record.url?scp=84865086069&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865086069&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bts367
DO - 10.1093/bioinformatics/bts367
M3 - Article
C2 - 22753780
AN - SCOPUS:84865086069
SN - 1367-4803
VL - 28
SP - 2162
EP - 2168
JO - Bioinformatics
JF - Bioinformatics
IS - 16
M1 - bts367
ER -