The length distributions of simple tandem repeats in the genomes of several organisms are evaluated and found to exhibit long-range correlations in A and T nucleotide bases related repeats for most eukaryotes. In particular, the length distributions of the mononucleotide A/T repeat units have longer tails than those of the C/G repeat units. Also, the length distributions of the dinucleotide repeat unit CG show a simple monotonously fast decreasing behavior, while those of repeat units AT, AG and AC have complicated structures at larger repeat lengths, especially for human, mouse and rat chromosomes. These distributive behaviors are due to the CpG deficiency in different genomes with different methylation activities. Especially, methyltransferases in vertebrates appear to methylate specifically the cytosine in CpG dinucleotides, and the methylated cytosines is prone to mutate to thymine by spontaneous deamination. The dinucleotide CpG would gradually decay into TpG and CpA. In addition, there is a peak in the distributions of repeat unit A at repeat-repeat separation 153 nt for humans and chimpanzees. We show that the long-tail behavior of mononucleotide repeat unit A and the peak at repeat separation 153 nt are due to the interspersed repetitive DNA sequences in humans and chimpanzees.
All Science Journal Classification (ASJC) codes
- Agricultural and Biological Sciences (miscellaneous)
- Applied Mathematics