Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as timeconsuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Keywords: microRNA, precursor, identification, machine learning methods, benchmark dataset, feature extraction, prediction algorithm.
Graphical Abstract
[http://dx.doi.org/10.1038/nature02871] [PMID: 15372042]
[http://dx.doi.org/10.1038/338313a0] [PMID: 2922060]
[http://dx.doi.org/10.1038/35002607] [PMID: 10706289]
[http://dx.doi.org/10.1038/sj.emboj.7600385] [PMID: 15372072]
[http://dx.doi.org/10.1038/nature01957] [PMID: 14508493]
[http://dx.doi.org/10.1016/j.tcb.2004.02.006] [PMID: 15134074]
[http://dx.doi.org/10.1261/rna.5167604] [PMID: 14730017]
[http://dx.doi.org/10.1126/science.1062039] [PMID: 11486053]
[http://dx.doi.org/10.1016/j.cell.2005.10.022] [PMID: 16271387]
[http://dx.doi.org/10.1007/s10142-005-0145-2] [PMID: 15875226 ]
[http://dx.doi.org/10.3390/genes10040321] [PMID: 31027314]
[http://dx.doi.org/10.3390/ijms20092079] [PMID: 31035580]
[http://dx.doi.org/10.3892/etm.2019.7552] [PMID: 31258640]
[PMID: 30369476]
[http://dx.doi.org/10.1186/s12958-018-0450-y] [PMID: 30630485]
[http://dx.doi.org/10.1038/s41598-017-16162-2] [PMID: 29162901]
[http://dx.doi.org/10.1021/acs.jafc.9b00622] [PMID: 30892026]
[http://dx.doi.org/10.2174/1574893611666160609081155]
[http://dx.doi.org/10.1093/bioinformatics/btx622] [PMID: 29028927]
[http://dx.doi.org/10.1038/nrd.2016.246] [PMID: 28209991]
[http://dx.doi.org/10.1080/17460441.2017.1263298] [PMID: 27866431]
[http://dx.doi.org/10.2217/pgs-2018-0051] [PMID: 29785870]
[http://dx.doi.org/10.1093/bioinformatics/bty002] [PMID: 29365045]
[http://dx.doi.org/10.1038/srep34820] [PMID: 27703231]
[http://dx.doi.org/10.1109/TCBB.2017.2776280] [PMID: 29990255]
[http://dx.doi.org/10.1126/science.1064921] [PMID: 11679670]
[http://dx.doi.org/10.1126/science.1065062] [PMID: 11679671]
[http://dx.doi.org/10.1101/gr.6597907] [PMID: 17989254]
[http://dx.doi.org/10.1186/gb-2003-4-7-r42] [PMID: 12844358]
[http://dx.doi.org/10.1093/bioinformatics/bti562] [PMID: 15994192]
[http://dx.doi.org/10.1093/bioinformatics/btp107] [PMID: 19233894]
[http://dx.doi.org/10.1093/bioinformatics/btm026] [PMID: 17267435]
[http://dx.doi.org/10.1186/1471-2105-6-310] [PMID: 16381612]
[http://dx.doi.org/10.1093/nar/gkm368] [PMID: 17553836]
[http://dx.doi.org/10.1186/1471-2105-11-S1-S29] [PMID: 20122201]
[http://dx.doi.org/10.1093/bioinformatics/btr153] [PMID: 21441575]
[http://dx.doi.org/10.1109/TCBB.2013.146]
[http://dx.doi.org/10.1371/journal.pone.0121501] [PMID: 25821974]
[http://dx.doi.org/10.1039/C6MB00295A] [PMID: 27472470]
[http://dx.doi.org/10.1080/07391102.2015.1014422] [PMID: 25645238]
[http://dx.doi.org/10.1155/2016/9565689] [PMID: 27635401]
[http://dx.doi.org/10.1038/s41598-018-36946-4] [PMID: 30679648]
[http://dx.doi.org/10.3389/fgene.2019.00119] [PMID: 30858864]
[http://dx.doi.org/10.1186/1471-2105-14-83] [PMID: 23497112]
[http://dx.doi.org/10.1109/TCBB.2016.2576459]
[http://dx.doi.org/10.1093/bioinformatics/btx612] [PMID: 29028911]
[http://dx.doi.org/10.1093/nar/gkw459] [PMID: 27242364]
[http://dx.doi.org/10.1038/nmeth746] [PMID: 15782219 ]
[http://dx.doi.org/10.1186/s12859-014-0423-x] [PMID: 25547126]
[http://dx.doi.org/10.1261/rna.043612.113] [PMID: 25795417]
[http://dx.doi.org/10.1039/C7MB00115K] [PMID: 28686281]
[http://dx.doi.org/10.2174/1574893613666181113131415]
[http://dx.doi.org/10.1093/bib/bbz048] [PMID: 31157855]
[http://dx.doi.org/10.2174/1389200219666180820112457] [PMID: 30124147]
[http://dx.doi.org/10.2174/1566523218666180913112751] [PMID: 30209997]
[PMID: 27543076]
[PMID: 28171531]
[http://dx.doi.org/10.1093/nar/gky1051] [PMID: 30380072]
[http://dx.doi.org/10.1093/bib/bbx103] [PMID: 28968812]
[http://dx.doi.org/10.1098/rsob.190054] [PMID: 31164042]
[http://dx.doi.org/10.1093/nar/gkh023] [PMID: 14681370]
[http://dx.doi.org/10.1093/nar/gky1141] [PMID: 30423142]
[http://dx.doi.org/10.1093/nar/gkp818] [PMID: 19808935]
[http://dx.doi.org/10.1093/nar/gkx1067] [PMID: 29126174]
[http://dx.doi.org/10.1093/nar/gkt1248] [PMID: 24297251]
[http://dx.doi.org/10.1093/nar/gkv698] [PMID: 26163062]
[http://dx.doi.org/10.1093/bioinformatics/btz358] [PMID: 31077296]
[http://dx.doi.org/10.2174/1574893614666181212102030]
[http://dx.doi.org/10.1109/TCBB.2017.2666141]
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085]
[http://dx.doi.org/10.1093/bib/bby028] [PMID: 29897410]
[http://dx.doi.org/10.2174/1574893612666171002113742]
[http://dx.doi.org/10.2174/1574893611666160711162006]
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID: 29528364]
[http://dx.doi.org/10.2174/1574893611666160628074537]
[http://dx.doi.org/10.3390/molecules22101732] [PMID: 29039790]
[http://dx.doi.org/10.1155/2014/286419] [PMID: 24991545]
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[http://dx.doi.org/10.1109/ACCESS.2018.2889809]
[http://dx.doi.org/10.1016/j.omtn.2019.08.011] [PMID: 31542696]
[http://dx.doi.org/10.1016/j.omtn.2019.04.019] [PMID: 31146255]
[http://dx.doi.org/10.1093/bioinformatics/bty1047] [PMID: 30590410 ]
[http://dx.doi.org/10.1093/nar/gkg599] [PMID: 12824340 ]
[http://dx.doi.org/10.1093/bioinformatics/btg388] [PMID: 14734309]
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968]
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492]
[http://dx.doi.org/10.1155/2016/1654623] [PMID: 27437396]
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583]
[http://dx.doi.org/10.1371/journal.pone.0145541] [PMID: 26713618]
[http://dx.doi.org/10.1089/cmb.2018.0004] [PMID: 30113871]
[http://dx.doi.org/10.1155/2014/623149] [PMID: 24967386]
[http://dx.doi.org/10.1093/bioinformatics/btu602] [PMID: 25231908]
[http://dx.doi.org/10.1006/bbrc.1999.1325] [PMID: 10527868]
[http://dx.doi.org/10.1093/bioinformatics/bth374] [PMID: 15217813]
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009]
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625]
[http://dx.doi.org/10.1016/j.omtn.2019.05.028] [PMID: 31299595]
[http://dx.doi.org/10.1016/j.knosys.2018.10.007]
[PMID: 29416743]
[http://dx.doi.org/10.3389/fmicb.2018.00476] [PMID: 29616000]
[http://dx.doi.org/10.1142/S1793524517500504]
[http://dx.doi.org/10.1145/1961189.1961199]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1007/BF00058655]
[http://dx.doi.org/10.1371/journal.pone.0106542] [PMID: 25222008]
[http://dx.doi.org/10.3389/fimmu.2018.01783] [PMID: 30108593]
[http://dx.doi.org/10.3389/fphar.2018.00276] [PMID: 29636690]
[http://dx.doi.org/10.1016/j.ijbiomac.2019.12.009] [PMID: 31805335]
[http://dx.doi.org/10.1039/C9MO00098D] [PMID: 31710075]
[http://dx.doi.org/10.3389/fgene.2018.00613] [PMID: 30619452]
[http://dx.doi.org/10.1186/1471-2105-11-438] [PMID: 20796314]
[http://dx.doi.org/10.1109/TSP.2006.880252]
[http://dx.doi.org/10.1021/acs.jcim.8b00368] [PMID: 30485088]
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[http://dx.doi.org/10.3934/mbe.2019123] [PMID: 31137222]
[http://dx.doi.org/10.1016/j.ygeno.2018.01.005] [PMID: 29360500]
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]
[http://dx.doi.org/10.3390/cells8111332] [PMID: 31661923 ]
[http://dx.doi.org/10.1016/j.csbj.2019.06.024] [PMID: 31372196]
[http://dx.doi.org/10.1097/00004424-198903000-00012] [PMID: 2753640]
[http://dx.doi.org/10.1186/s12864-017-4338-6] [PMID: 29363423]
[http://dx.doi.org/10.3389/fgene.2018.00657] [PMID: 30619477 ]
[http://dx.doi.org/10.1080/01431160110107743]
[http://dx.doi.org/10.1613/jair.953]
[http://dx.doi.org/10.1093/bib/bbl016] [PMID: 16772269]
[http://dx.doi.org/10.1093/nar/gks146] [PMID: 22362754]
[http://dx.doi.org/10.1093/bib/bby053] [PMID: 29947743]
[http://dx.doi.org/10.1007/s00438-015-1078-7] [PMID: 26085220]
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
[http://dx.doi.org/10.1093/nar/gku1019] [PMID: 25361964]
[http://dx.doi.org/10.1007/978-1-62703-748-8_10] [PMID: 24272437]
[http://dx.doi.org/10.1002/prot.25697] [PMID: 30985027]
[http://dx.doi.org/10.2174/1574893612666170707095707]
[http://dx.doi.org/10.2174/1574893611666160815150746]
[http://dx.doi.org/10.2174/1574893612666170221152848]
[http://dx.doi.org/10.1186/s12859-016-1405-y] [PMID: 27919220]