Motivation: In current databases, there are many genes with inconsistent mapping positions between their cytogenetic annotations and sequence map positions. However, not all inconsistencies are the same. Some of them may be problematic which should be corrected in the future; while others may result from the imprecise nature of chromosomal banding which may be tolerable. It is important to stratify the cytogenetic position information into different confidence groups with the recognition of the impreciseness of cytogenetic banding. Results: When plotting their cytogenetic annotations against sequence map positions on a 2D plane, the consistent genes tend to have a compact linear distribution; while genes with inconsistent positions are more scattered. The overlapping areas between these two groups are defined as the tolerable imprecision zones by linear regression and distance analysis. The system was implemented using sequence information from NCBI Map Viewer Build 36.3 and cytogenetic annotations from NCBI Entrez Gene. The genes' position information is classified into five confidence groups: inconsistent-intolerable, inconsistent-tolerable, consistent-imprecise, consistent-precise and consistent-rough. Using information from NCBI Map Viewer Build 36.3 and NCBI Entrez Gene, the percentages of these confidence groups are 1.4%, 7.0%, 54.0%, 35.4% and 2.2%, respectively. Using information from NCBI Map Viewer Build 36.3 and NCBI online Mendelian inheritance in man (OMIM), the percentages are 3.7%, 16.9%, 49.0%, 19.0% and 11.4%, respectively. Combining these two results, a confidence table of genes' position information was constructed.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics