Metazoa gene expression is controlled by modular DNA segments called cis-regulatory modules (CRMs). CRMs can convey promoter/enhancer/insulator roles, generating additional regulation layers in transcription. Experiments for understanding CRM roles are low-throughput and costly. Large-scale CRM function investigation still depends on computational methods. However, existing in silico tools only recognize enhancers or promoters exclusively, thus accumulating errors when considering CRM promoter/enhancer/insulator roles altogether. Currently, no algorithm can concurrently consider these CRM roles. In this research, we developed the CRM Function Annotator (CFA) model. CFA provides complete CRM transcriptional role labeling based on epigenetic profiling interpretation. We demonstrated that CFA achieves high performance (test macro auROC/auPRC = 94.1%/90.3%) and outperforms existing tools in promoter/enhancer/insulator identification. CFA is also inspected to recognize explainable epigenetic codes consistent with previous findings when labeling CRM roles. By considering the higher-order combinations of the epigenetic codes, CFA significantly reduces false-positive rates in CRM transcriptional role annotation. CFA is available at https://github.com/cobisLab/CFA/.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Health Informatics