Abstract
Background: IsomiR is an isoform of microRNA (miRNA), and its sequences vary from those of a reference miRNA, which arose with the advencements of deep sequencing, high miRNA variability has been detected from the same miRNA precursor. IsomiR exists in four main types formed through the following processes: 5' or 3' trimming, Nucleotide addition, Nucleotide removal, and posttranscriptional RNA editing.
Objective: For cancer diagnosis, it needs to explore differential expression profiles which can be used to distinguish cancer and normal cell lines, especially in the isomiR-mRNA regulatory networks, because aberrant isomiR expression profiles may contribute to tumorigenesis.
Method: We extracted five features of the isomiR read counts from RNA-SEQ data in TCGA, with a random forest classification algorithm, these features were applied to diagnose six cancers: breast invasive carcinoma, lung adenocarcinoma, squamous-cell carcinoma of the lung, stomach adenocarcinoma, thyroid carcinoma, and uterine corpus endometrial carcinoma.
Results: Compare with the classifier libD3C, our method can be utilized to distinguish cancers from their normal counterparts by performance based on sn, sp, ACC and MCC measures.
Conclusion: IsomiR can be successfully and effectively used to diagnose cancer through machine learning method from high-throughput data.
Keywords: MicroRNA, isomiR, cancer, machine learning, high-throughput data, RNA-SEQ data.
Graphical Abstract