Abstract
Background: This study aims at exploring the advances in data repositories for predicting interactions between non-coding RNAs (ncRNAs) and corresponding proteins. NcRNAs are a class of ribonucleic acid that lacks the potential for protein translation. A series of studies indicated that ncRNAs play critical roles in epigenetic regulations, chromatin remodeling, transcription process, and post-transcriptional processing. Since ncRNAs function with associated proteins during complex biological procedures, it is important to identify ncRNA-protein interactions, which will provide guidance for exploring the internal molecular mechanisms. Recently, a variety of machine learning methods have emerged, with the lower cost and time-saving advantages compared to experimental methods. In machine learning, the performance of classification models is often affected by the quality of input samples and their features.
Aims: Thus, the study intends to introduce the related data sources used in predicting ncRNAprotein interactions (ncRPIs) based on machine learning.
Methods: We searched related literature from different sources, including PubMed, Web of Science, and Scopus, using the search terms “machine learning”, “repository”, “non-coding RNA”, and “protein”. In this work, we described the databases applied to the dataset construction and feature representation in the ncRPIs prediction task.
Results: This study reviews the application of the benchmark dataset construction and conventional feature representation during ncRPI prediction processes. Furthermore, the source, main functions, and development status of each database are also discussed in this work.
Conclusion: With the development of high-throughput technologies for generating ncRPIs and constructing related databases, machine learning would become a necessary research means, enriching the prediction methods of ncRPIs. Due to an increase in improved databases, the resources of molecular structures, functions, and genetic information for data mining have increased, enhancing the credibility of ncRPI prediction based on machine learning. We believe that the databases will be more widely used in disease research, drug development, and many other fields.
Keywords: ncRNA-protein interaction, related databases of ncRNA-protein interaction, benchmark datasets, dataset construction, feature representation, machine learning.
Graphical Abstract
[http://dx.doi.org/10.1093/cvr/cvr097] [PMID: 21558279]
[http://dx.doi.org/10.1515/BC.2008.047] [PMID: 18225988]
[http://dx.doi.org/10.1038/nature06992] [PMID: 18509338]
[http://dx.doi.org/10.1126/science.1157676] [PMID: 18535243]
[http://dx.doi.org/10.1002/path.2638] [PMID: 19882673]
[http://dx.doi.org/10.1016/j.cmet.2015.04.003] [PMID: 25921091]
[http://dx.doi.org/10.1038/s41467-018-05933-8] [PMID: 30190464]
[http://dx.doi.org/10.1038/sj.emboj.7601187] [PMID: 16763550]
[http://dx.doi.org/10.1002/wrna.1704] [PMID: 34856642]
[http://dx.doi.org/10.1016/j.cell.2007.05.022] [PMID: 17604720]
[http://dx.doi.org/10.1038/nrn.2017.90] [PMID: 28855739]
[http://dx.doi.org/10.2337/dbi18-0009] [PMID: 31010880]
[http://dx.doi.org/10.1038/srep11207] [PMID: 26053433]
[http://dx.doi.org/10.1038/nature08975] [PMID: 20393566]
[http://dx.doi.org/10.1016/j.tibs.2012.02.005] [PMID: 22425269]
[http://dx.doi.org/10.1002/wrna.1378] [PMID: 27503141]
[http://dx.doi.org/10.1038/s41592-019-0330-1] [PMID: 30804549]
[http://dx.doi.org/10.3390/ijms22168719] [PMID: 34445436]
[http://dx.doi.org/10.1093/bib/bbw060] [PMID: 27345524]
[http://dx.doi.org/10.3390/genes12122018] [PMID: 34946967]
[http://dx.doi.org/10.1093/nar/gkz969] [PMID: 31670377]
[http://dx.doi.org/10.1155/2015/671950] [PMID: 26839884]
[http://dx.doi.org/10.1038/s41598-017-03986-1] [PMID: 28623317]
[http://dx.doi.org/10.1016/j.ygeno.2019.09.018] [PMID: 31639442]
[http://dx.doi.org/10.2174/2210298101666210713120933]
[http://dx.doi.org/10.1186/1471-2105-12-489] [PMID: 22192482]
[http://dx.doi.org/10.1039/C2MB25292A] [PMID: 23138266]
[http://dx.doi.org/10.1186/1471-2164-14-651] [PMID: 24063787]
[http://dx.doi.org/10.1093/nar/gkv020] [PMID: 25609700]
[http://dx.doi.org/10.1016/j.jtbi.2016.04.025] [PMID: 27134008]
[http://dx.doi.org/10.1186/s12864-016-2931-8] [PMID: 27506469]
[http://dx.doi.org/10.1016/j.omtn.2018.03.001] [PMID: 29858068]
[http://dx.doi.org/10.1186/s12859-021-04069-9] [PMID: 33740884]
[http://dx.doi.org/10.1016/j.neucom.2019.08.084]
[http://dx.doi.org/10.1093/nar/gkaa1038] [PMID: 33211854]
[http://dx.doi.org/10.1038/s41586-021-03819-2] [PMID: 34265844]
[http://dx.doi.org/10.1093/nar/gkt980] [PMID: 24185695]
[http://dx.doi.org/10.1093/nar/gkq1108] [PMID: 21071426]
[http://dx.doi.org/10.1093/nar/gkj025] [PMID: 16381834]
[http://dx.doi.org/10.1186/gb-2010-11-8-r90] [PMID: 20799968]
[http://dx.doi.org/10.1093/nar/gkx864] [PMID: 29040625]
[http://dx.doi.org/10.1093/nar/gks1193] [PMID: 23193258]
[http://dx.doi.org/10.1038/nature11247] [PMID: 22955616]
[http://dx.doi.org/10.1093/nar/gkaa1046] [PMID: 33196801]
[http://dx.doi.org/10.1093/nar/gkz804] [PMID: 31906603]
[http://dx.doi.org/10.1093/nar/gkw1052] [PMID: 27899615]
[http://dx.doi.org/10.1093/nar/gkt996] [PMID: 24163250]
[http://dx.doi.org/10.1093/nar/gkab825] [PMID: 34551440]
[http://dx.doi.org/10.1093/nar/gkx934] [PMID: 29040692]
[http://dx.doi.org/10.3389/fbioe.2014.00088] [PMID: 25642422]
[http://dx.doi.org/10.1038/nmeth.1611] [PMID: 21623348]
[http://dx.doi.org/10.3389/fgene.2019.01341] [PMID: 32038709]
[http://dx.doi.org/10.1093/nar/gky1141] [PMID: 30423142]
[http://dx.doi.org/10.1038/nature07488] [PMID: 18978773]
[http://dx.doi.org/10.1038/nmeth.3810] [PMID: 27018577]
[http://dx.doi.org/10.1186/s12918-017-0390-8] [PMID: 28361676]
[http://dx.doi.org/10.1093/nar/gkaa1100] [PMID: 33237286]
[http://dx.doi.org/10.1093/nar/gks1050] [PMID: 23161678]
[http://dx.doi.org/10.1002/0471250953.bi0129s53] [PMID: 27010333]
[http://dx.doi.org/10.1093/nar/gkz899] [PMID: 31602479]
[http://dx.doi.org/10.1093/nar/gkz966] [PMID: 31691826]
[http://dx.doi.org/10.1038/srep34503] [PMID: 27698367]
[http://dx.doi.org/10.1038/ncomms14294] [PMID: 28220783]
[http://dx.doi.org/10.1016/j.jaci.2017.06.047] [PMID: 28826773]
[http://dx.doi.org/10.1093/bib/bbv078] [PMID: 26411472]
[http://dx.doi.org/10.1093/nar/gkt807] [PMID: 24038354]
[http://dx.doi.org/10.1371/journal.pcbi.1006616] [PMID: 30533006]
[http://dx.doi.org/10.1093/nar/gkaa913] [PMID: 33125078]
[http://dx.doi.org/10.1093/nar/gky1100] [PMID: 30398656]
[http://dx.doi.org/10.1016/j.ygeno.2020.05.005] [PMID: 32437848]
[http://dx.doi.org/10.1016/j.ab.2020.113767] [PMID: 32454029]
[http://dx.doi.org/10.1016/j.compbiolchem.2019.107088] [PMID: 31330489]
[http://dx.doi.org/10.3390/ijms20051070] [PMID: 30832218]
[http://dx.doi.org/10.1080/15476286.2018.1457935] [PMID: 29583068]
[http://dx.doi.org/10.1093/bioinformatics/btq253] [PMID: 20483814]
[http://dx.doi.org/10.1093/bioinformatics/btu352] [PMID: 24860169]
[http://dx.doi.org/10.1093/bioinformatics/btx218] [PMID: 28430949]
[http://dx.doi.org/10.1038/s41467-019-13395-9] [PMID: 31776342]
[http://dx.doi.org/10.1186/1748-7188-6-26] [PMID: 22115189]
[http://dx.doi.org/10.1093/nar/gkm998] [PMID: 17998252]
[http://dx.doi.org/10.4052/tigg.12.51]
[http://dx.doi.org/10.1093/nar/gkv458] [PMID: 25958395]
[http://dx.doi.org/10.1016/j.chemolab.2019.103919]
[http://dx.doi.org/10.1093/bioinformatics/bty364] [PMID: 29722865]
[http://dx.doi.org/10.1093/nar/gky1130] [PMID: 30445555]
[http://dx.doi.org/10.1093/nar/gkt1240] [PMID: 24304899]
[http://dx.doi.org/10.1093/nar/gkw1098] [PMID: 27899584]
[http://dx.doi.org/10.1371/journal.pcbi.1003926] [PMID: 25474468]
[http://dx.doi.org/10.3389/fgene.2020.615144] [PMID: 33362868]
[http://dx.doi.org/10.1093/nar/26.1.38] [PMID: 9399796]
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[http://dx.doi.org/10.1093/nar/gkl298] [PMID: 16845003]
[http://dx.doi.org/10.1186/1471-2105-9-S12-S6] [PMID: 19091029]
[http://dx.doi.org/10.3389/fgene.2018.00458] [PMID: 30349558]
[http://dx.doi.org/10.1016/j.jtbi.2018.10.029] [PMID: 30321541]
[http://dx.doi.org/10.3390/ijms20040978] [PMID: 30813451]
[http://dx.doi.org/10.1142/S0219720020500092] [PMID: 32404014]
[http://dx.doi.org/10.2174/157489307781662105]
[http://dx.doi.org/10.1093/nar/gkaa1087] [PMID: 33270111]
[http://dx.doi.org/10.1093/nar/gkz1012] [PMID: 31691824]
[http://dx.doi.org/10.1038/s41586-020-2286-9] [PMID: 32353859]
[http://dx.doi.org/10.1016/j.csbj.2019.11.004] [PMID: 31890140]
[http://dx.doi.org/10.1186/s12864-018-4889-1] [PMID: 29970003]
[http://dx.doi.org/10.1186/s12859-017-1561-8] [PMID: 28245811]
[http://dx.doi.org/10.1093/nar/gkab393] [PMID: 34086933]
[http://dx.doi.org/10.1093/nar/gkab702] [PMID: 34403477]
[http://dx.doi.org/10.1038/nature12311] [PMID: 23846655]
[http://dx.doi.org/10.1093/nar/gkt1248] [PMID: 24297251]
[http://dx.doi.org/10.18632/oncotarget.21934] [PMID: 29262614]
[http://dx.doi.org/10.3389/fgene.2018.00239] [PMID: 30023002]
[http://dx.doi.org/10.1109/TCBB.2018.2861009] [PMID: 30059315]
[http://dx.doi.org/10.1109/JBHI.2021.3122527] [PMID: 34699377]
[http://dx.doi.org/10.1093/bib/bbab051] [PMID: 33822882]
[http://dx.doi.org/10.3390/ijms20061284] [PMID: 30875752]
[http://dx.doi.org/10.1186/s12859-017-1819-1] [PMID: 29072138]
[http://dx.doi.org/10.1093/nar/gkaa1074] [PMID: 33237311]