MicroRPM: A microRNA prediction model based only on plant small RNA sequencing data

Kuan Chieh Tseng, Yi Fan Chiang-Hsieh, Hsuan Pai, Chi Nga Chow, Shu Chuan Lee, Han Qin Zheng, Po Li Kuo, Guan Zhen Li, Yu Cheng Hung, Na Sheng Lin, Wen Chi Chang

Motivation MicroRNAs (miRNAs) are endogenous non-coding small RNAs (of about 22 nucleotides), which play an important role in the post-Transcriptional regulation of gene expression via either mRNA cleavage or translation inhibition. Several machine learning-based approaches have been developed to identify novel miRNAs from next generation sequencing (NGS) data. Typically, precursor/genomic sequences are required as references for most methods. However, the non-Availability of genomic sequences is often a limitation in miRNA discovery in non-model plants. A systematic approach to determine novel miRNAs without reference sequences is thus necessary. Results In this study, an effective method was developed to identify miRNAs from non-model plants based only on NGS datasets. The miRNA prediction model was trained with several duplex structure-related features of mature miRNAs and their passenger strands using a support vector machine algorithm. The accuracy of the independent test reached 96.61% and 93.04% for dicots (Arabidopsis) and monocots (rice), respectively. Furthermore, true small RNA sequencing data from orchids was tested in this study. Twenty-one predicted orchid miRNAs were selected and experimentally validated. Significantly, 18 of them were confirmed in the qRT-PCR experiment. This novel approach was also compiled as a user-friendly program called microRPM (miRNA Prediction Model). Availability and implementation This resource is freely available at http://microRPM.itps.ncku.edu.tw. Contact nslin@sinica.edu.tw or sarah321@mail.ncku.edu.tw Supplementary informationSupplementary dataare available at Bioinformatics online.

