TY - JOUR
T1 - Characterization and distribution of repetitive elements in association with genes in the human genome
AU - Liang, Kai Chiang
AU - Tseng, Joseph T.
AU - Tsai, Shaw Jenq
AU - Sun, H. Sunny
N1 - Funding Information:
The authors would like to express great appreciation to the Bioinformatics Core at National Cheng Kung University for providing assistance in computational analyses. This work was supported by the National Science Council , Taiwan ( NSC 101-2320-B-006-017-MY2 ).
Publisher Copyright:
© 2015 Elsevier Ltd. All rights reserved.
PY - 2015/5/16
Y1 - 2015/5/16
N2 - Repetitive elements constitute more than 50% of the human genome. Recent studies implied that the complexity of living organisms is not just a direct outcome of a number of coding Sequences; the repetitive elements, which do not encode proteins, may also play a significant role. Though scattered studies showed that repetitive elements in the regulatory regions of a gene control gene expression, no systematic survey has been done to report the characterization and distribution of various types of these repetitive elements in the human genome. Sequences from 5′ and 3′ untranslated regions and upstream and downstream of a gene were downloaded from the Ensembl database. The repetitive elements in the neighboring of each gene were identified and classified using cross-matching implemented in the RepeatMasker. The annotation and distribution of distinct classes of repetitive elements associated with individual gene were collected to characterize genes in association with different types of repetitive elements using systems biology program. We identified a total of 1,068,400 repetitive elements which belong to 37-class families and 1235 subclasses that are associated with 33,761 genes and 57,365 transcripts. In addition, we found that the tandem repeats preferentially locate proximal to the transcription start site (TSS) of genes and the major function of these genes are involved in developmental processes. On the other hand, interspersed repetitive elements showed a tendency to be accumulated at distal region from the TSS and the function of interspersed repeat-containing genes took part in the catabolic/metabolic processes. Results from the distribution analysis were collected and used to construct a gene-based repetitive element database (GBRED; http://www.binfo.ncku.edu.tw/GBRED/index.html). A user-friendly web interface was designed to provide the information of repetitive elements associated with any particular gene(s). This is the first study focusing on the gene-associated repetitive elements in the human genome. Our data showed distinct genes associated with different kinds of repetitive element and implied such combination may shape the function of these genes. Aside from the conventional view of these elements in genome evolution, results from this study offer a systemic review to facilitate exploitation of these elements in genome function.
AB - Repetitive elements constitute more than 50% of the human genome. Recent studies implied that the complexity of living organisms is not just a direct outcome of a number of coding Sequences; the repetitive elements, which do not encode proteins, may also play a significant role. Though scattered studies showed that repetitive elements in the regulatory regions of a gene control gene expression, no systematic survey has been done to report the characterization and distribution of various types of these repetitive elements in the human genome. Sequences from 5′ and 3′ untranslated regions and upstream and downstream of a gene were downloaded from the Ensembl database. The repetitive elements in the neighboring of each gene were identified and classified using cross-matching implemented in the RepeatMasker. The annotation and distribution of distinct classes of repetitive elements associated with individual gene were collected to characterize genes in association with different types of repetitive elements using systems biology program. We identified a total of 1,068,400 repetitive elements which belong to 37-class families and 1235 subclasses that are associated with 33,761 genes and 57,365 transcripts. In addition, we found that the tandem repeats preferentially locate proximal to the transcription start site (TSS) of genes and the major function of these genes are involved in developmental processes. On the other hand, interspersed repetitive elements showed a tendency to be accumulated at distal region from the TSS and the function of interspersed repeat-containing genes took part in the catabolic/metabolic processes. Results from the distribution analysis were collected and used to construct a gene-based repetitive element database (GBRED; http://www.binfo.ncku.edu.tw/GBRED/index.html). A user-friendly web interface was designed to provide the information of repetitive elements associated with any particular gene(s). This is the first study focusing on the gene-associated repetitive elements in the human genome. Our data showed distinct genes associated with different kinds of repetitive element and implied such combination may shape the function of these genes. Aside from the conventional view of these elements in genome evolution, results from this study offer a systemic review to facilitate exploitation of these elements in genome function.
UR - http://www.scopus.com/inward/record.url?scp=84939598978&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84939598978&partnerID=8YFLogxK
U2 - 10.1016/j.compbiolchem.2015.02.007
DO - 10.1016/j.compbiolchem.2015.02.007
M3 - Article
C2 - 25748288
AN - SCOPUS:84939598978
SN - 1476-9271
VL - 57
SP - 29
EP - 38
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
ER -