Review Article

使用机器学习方法预测蛋白质-lncRNA 相互作用的最新进展

卷 22, 期 3, 2022

发表于: 12 July, 2021

页: [228 - 244] 页: 17

弟呕挨: 10.2174/1566523221666210712190718

价格: $65

摘要

长链非编码 RNA (LncRNA) 是一种具有很少或没有蛋白质编码能力的 RNA。它们的长度超过 200 个核苷酸。大量研究表明,lncRNA在各种生物学过程中发挥着重要作用,包括细胞水平的染色质组织、表观遗传编程、转录调控、转录后加工和昼夜节律机制。由于 lncRNA 通过与蛋白质的相互作用发挥着巨大的功能,识别 lncRNA-蛋白质相互作用对于理解 lncRNA 分子功能至关重要。然而,由于实验方法成本高、耗时长的缺点,各种计算方法应运而生。最近,已经开发了许多有效和新颖的机器学习方法。一般来说,这些方法分为两类:半监督学习方法和监督学习方法。后一类可以进一步分为基于深度学习的方法、基于集成学习的方法和混合方法。在本文中,我们专注于监督学习方法。我们总结了预测 lncRNA-蛋白质相互作用的最新方法。此外,本文还比较了不同方法的性能和特点。考虑到现有模型的局限性,我们分析了问题并讨论了未来的研究潜力。

关键词: lncRNA-蛋白质相互作用预测、计算模型、机器学习、深度学习、LncRNA、染色质组织

图形摘要

[1]
Spizzo R, Almeida MI, Colombatti A, Calin GA. Long non-coding RNAs and cancer: A new frontier of translational research? Oncogene 2012; 31(43): 4577-87.
[http://dx.doi.org/10.1038/onc.2011.621] [PMID: 22266873]
[2]
Struhl K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat Struct Mol Biol 2007; 14(2): 103-5.
[http://dx.doi.org/10.1038/nsmb0207-103] [PMID: 17277804]
[3]
Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell 2009; 136(4): 629-41.
[http://dx.doi.org/10.1016/j.cell.2009.02.006] [PMID: 19239885]
[4]
Gonzalez I, Munita R, Agirre E, et al. A lncRNA regulates alternative splicing via establishment of a splicing-specific chromatin signature. Nat Struct Mol Biol 2015; 22(5): 370-6.
[http://dx.doi.org/10.1038/nsmb.3005] [PMID: 25849144]
[5]
Xu X, Wang K, Zha X. An antisense lncRNA functions in alternative splicing of Bmdsx in the silkworm, Bombyx mori. Biochem Biophys Res Commun 2019; 516(3): 639-44.
[http://dx.doi.org/10.1016/j.bbrc.2019.06.107] [PMID: 31242972]
[6]
Schaukowitch K, Kim T-K. Emerging epigenetic mechanisms of long non-coding RNAs. Neuroscience 2014; 264: 25-38.
[http://dx.doi.org/10.1016/j.neuroscience.2013.12.009] [PMID: 24342564]
[7]
Barkan A. Genome-wide analysis of RNA-protein interactions in plants. Totowa, NJ: Humana Press 2009; 553.
[http://dx.doi.org/10.1007/978-1-60327-563-7_2]
[8]
Tripathi R, Soni A, Varadwaj PK. Integrated analysis of dysregulated lncRNA expression in breast cancer cell identified by RNA-seq study. Noncoding RNA Res 2016; 1(1): 35-42.
[http://dx.doi.org/10.1016/j.ncrna.2016.09.002] [PMID: 30159409]
[9]
Engreitz JM, Haines JE, Perez EM, et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 2016; 539(7629): 452-5.
[http://dx.doi.org/10.1038/nature20149] [PMID: 27783602]
[10]
Cao J. The functional role of long non-coding RNAs and epigenetics. Biol Proced Online 2014; 16(1): 11.
[http://dx.doi.org/10.1186/1480-9222-16-11] [PMID: 25276098]
[11]
Zhu J, Fu H, Wu Y, Zheng X. Function of lncRNAs and approaches to lncRNA-protein interactions. Sci China Life Sci 2013; 56(10): 876-85.
[http://dx.doi.org/10.1007/s11427-013-4553-6] [PMID: 24091684]
[12]
Pang Q, Ge J, Shao Y, et al. Increased expression of long intergenic non-coding RNA LINC00152 in gastric cancer and its clinical significance. Tumour Biol 2014; 35(6): 5441-7.
[http://dx.doi.org/10.1007/s13277-014-1709-3] [PMID: 24523021]
[13]
Gupta RA, Shah N, Wang KC, et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 2010; 464(7291): 1071-6.
[http://dx.doi.org/10.1038/nature08975] [PMID: 20393566]
[14]
Fang Y, Fullwood MJ. Roles, functions, and mechanisms of long non-coding RNAs in cancer. Genomics Proteomics Bioinformatics 2016; 14(1): 42-54.
[http://dx.doi.org/10.1016/j.gpb.2015.09.006] [PMID: 26883671]
[15]
Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol 2011; 21(6): 354-61.
[http://dx.doi.org/10.1016/j.tcb.2011.04.001] [PMID: 21550244]
[16]
Taft RJ, Pang KC, Mercer TR, Dinger M, Mattick JS. Non-coding RNAs: Regulators of disease. J Pathol 2010; 220(2): 126-39.
[http://dx.doi.org/10.1002/path.2638] [PMID: 19882673]
[17]
Khalil AM, Rinn JL. RNA-protein interactions in human health and disease. Semin Cell Dev Biol 2011; 22(4): 359-65.
[http://dx.doi.org/10.1016/j.semcdb.2011.02.016] [PMID: 21333748]
[18]
Kohlmaier A, Savarese F, Lachner M, Martens J, Jenuwein T, Wutz A. A chromosomal memory triggered by Xist regulates histone methylation in X inactivation. PLoS Biol 2004; 2(7): E171.
[http://dx.doi.org/10.1371/journal.pbio.0020171] [PMID: 15252442]
[19]
Rinn JL, Kertesz M, Wang JK, et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 2007; 129(7): 1311-23.
[http://dx.doi.org/10.1016/j.cell.2007.05.022] [PMID: 17604720]
[20]
Tripathi V, Ellis JD, Shen Z, et al. The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell 2010; 39(6): 925-38.
[http://dx.doi.org/10.1016/j.molcel.2010.08.011] [PMID: 20797886]
[21]
Pang KC, Frith MC, Mattick JS. Rapid evolution of noncoding RNAs: Lack of conservation does not mean lack of function. Trends Genet 2006; 22(1): 1-5.
[http://dx.doi.org/10.1016/j.tig.2005.10.003] [PMID: 16290135]
[22]
Johnsson P, Lipovich L, Grandér D, Morris KV. Evolutionary conservation of long non-coding RNAs; sequence, structure, function. Biochim Biophys Acta 2014; 1840(3): 1063-71.
[http://dx.doi.org/10.1016/j.bbagen.2013.10.035] [PMID: 24184936]
[23]
Keene JD, Komisarow JM, Friedersdorf MB. RIP-Chip: The isolation and identification of mRNAs, microRNAs and protein components of ribonucleoprotein complexes from cell extracts. Nat Protoc 2006; 1(1): 302-7.
[http://dx.doi.org/10.1038/nprot.2006.47] [PMID: 17406249]
[24]
Ray D, Kazan H, Chan ET, et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat Biotechnol 2009; 27(7): 667-70.
[http://dx.doi.org/10.1038/nbt.1550] [PMID: 19561594]
[25]
Darnell RB. HITS-CLIP: Panoramic views of protein-RNA regulation in living cells. Wiley Interdiscip Rev RNA 2010; 1(2): 266-86.
[http://dx.doi.org/10.1002/wrna.31] [PMID: 21935890]
[26]
Li A, Ge M, Zhang Y, Peng C, Wang M. Predicting long noncoding RNA and protein interactions using heterogeneous network model. BioMed Res Int 2015; 2015: 671950.
[http://dx.doi.org/10.1155/2015/671950] [PMID: 26839884]
[27]
Yang J, Li A, Ge M, et al. Relevance search for predicting lncRNA–protein interactions based on heterogeneous network. Neurocomputing 2016; 206: 81-8.
[http://dx.doi.org/10.1016/j.neucom.2015.11.109]
[28]
Ge M, Li A, Wang M. A bipartite network-based method for prediction of long non-coding RNA-protein interactions. Genomics Proteomics Bioinformatics 2016; 14(1): 62-71.
[http://dx.doi.org/10.1016/j.gpb.2016.01.004] [PMID: 26917505]
[29]
Zheng X, Wang Y, Tian K, et al. Fusing multiple protein-protein similarity networks to effectively predict lncRNA-protein interactions. BMC Bioinformatics 2017; 18(S12)(Suppl. 12): 420.
[http://dx.doi.org/10.1186/s12859-017-1819-1] [PMID: 29072138]
[30]
Zhang W, Qu Q, Zhang Y, et al. The linear neighborhood propagation method for predicting long non-coding RNA-protein interactions. Neurocomputing 2018; 273: 526-34.
[http://dx.doi.org/10.1016/j.neucom.2017.07.065]
[31]
Zhang H, Ming Z, Fan C, Zhao Q, Liu H. A path-based computational model for long non-coding RNA-protein interaction prediction. Genomics 2020; 112(2): 1754-60.
[http://dx.doi.org/10.1016/j.ygeno.2019.09.018] [PMID: 31639442]
[32]
Zhang T, Wang M, Xi J, et al. LPGNMF: Predicting long noncoding RNA and protein interaction using graph regularized nonnegative matrix factorization. IEEE/ACM Trans Comput Biol and Bioinf 2020; 17(1): 189-97.
[http://dx.doi.org/10.1109/TCBB.2018.2861009] [PMID: 30059315]
[33]
Liu H, Ren G, Hu H, et al. LPI-NRLMF: LncRNA-protein interaction prediction by neighborhood regularized logistic matrix factorization. Oncotarget 2017; 8(61): 103975-84.
[http://dx.doi.org/10.18632/oncotarget.21934] [PMID: 29262614]
[34]
Zhao Q, Zhang Y, Hu H, Ren G, Zhang W, Liu H. IRWNRLPI: Integrating random walk and neighborhood regularized logistic matrix factorization for lncrna-protein interaction prediction. Front Genet 2018; 9: 239.
[http://dx.doi.org/10.3389/fgene.2018.00239] [PMID: 30023002]
[35]
Ma Y, He T, Jiang X. Projection-based neighborhood non-negative matrix factorization for lncrna-protein interaction prediction. Front Genet 2019; 10: 1148.
[http://dx.doi.org/10.3389/fgene.2019.01148] [PMID: 31824563]
[36]
Shen C, Ding Y, Tang J, Guo F. Multivariate information fusion with fast kernel learning to kernel ridge regression in predicting lncrna-protein interactions. Front Genet 2019; 9: 716.
[http://dx.doi.org/10.3389/fgene.2018.00716] [PMID: 30697228]
[37]
Shen C, Ding Y, Tang J, et al. LPI-KTASLP: Prediction of lncrnaprotein interaction by semi-supervised link learning with multivariate information. IEEE Access 2019; 7: 13486-96.
[http://dx.doi.org/10.1109/ACCESS.2019.2894225]
[38]
Muppirala UK, Honavar VG, Dobbs D. Predicting RNA-protein interactions using only sequence information. BMC Bioinformatics 2011; 12(1): 489.
[http://dx.doi.org/10.1186/1471-2105-12-489] [PMID: 22192482]
[39]
Wang Y, Chen X, Liu Z-P, et al. De novo prediction of RNA-protein interactions from sequence information. Mol Biosyst 2013; 9(1): 133-42.
[http://dx.doi.org/10.1039/C2MB25292A] [PMID: 23138266]
[40]
Lu Q, Ren S, Lu M, et al. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics 2013; 14(1): 651.
[http://dx.doi.org/10.1186/1471-2164-14-651] [PMID: 24063787]
[41]
Suresh V, Liu L, Adjeroh D, Zhou X. RPI-Pred: Predicting ncRNA-protein interaction using sequence and structural information. Nucleic Acids Res 2015; 43(3): 1370-9.
[http://dx.doi.org/10.1093/nar/gkv020] [PMID: 25609700]
[42]
Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 2015; 33(8): 831-8.
[http://dx.doi.org/10.1038/nbt.3300] [PMID: 26213851]
[43]
Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 2015; 12(10): 931-4.
[http://dx.doi.org/10.1038/nmeth.3547] [PMID: 26301843]
[44]
Hassanzadeh HR, Wang MD. DeeperBind: Enhancing prediction of sequence specificities of dna binding proteins 6
[http://dx.doi.org/10.1109/BIBM.2016.7822515]
[45]
Lewis BA, Walia RR, Terribilini M, et al. PRIDB: A protein-RNA interface database. Nucleic Acids Research 2011; 39(Database): D277-82.
[46]
consortium wwPDB, Burley SK, Berman HM. Protein data bank: The single global archive for 3D macromolecular structure data. Nucleic Acids Res 2019; 47(D1): D520-2.
[http://dx.doi.org/10.1093/nar/gky949]
[47]
Coimbatore Narayanan B, Westbrook J, Ghosh S, et al. The nucleic acid database: New features and capabilities. Nucleic Acids Res 2014; 42(Database issue): D114-22.
[http://dx.doi.org/10.1093/nar/gkt980] [PMID: 24185695]
[48]
Yuan J, Wu W, Xie C, Zhao G, Zhao Y, Chen R. NPInter v2.0: An updated database of ncRNA interactions. Nucleic Acids Res 2014; 42(Database issue): D104-8.
[http://dx.doi.org/10.1093/nar/gkt1057] [PMID: 24217916]
[49]
Hao Y, Wu W, Li H, et al. NPInter v3.0: An upgraded database of noncoding RNA-associated interactions. Database (Oxford) 2016; 2016baw057
[http://dx.doi.org/10.1093/database/baw057] [PMID: 27087310]
[50]
Teng X, Chen X, Xue H, et al. NPInter v4.0: An integrated database of ncRNA interactions. Nucleic Acids Res 2019; •••gkz969
[http://dx.doi.org/10.1093/nar/gkz969] [PMID: 31670377]
[51]
Pan X, Fan Y-X, Yan J, Shen HB. IPMiner: Hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction. BMC Genomics 2016; 17(1): 582.
[http://dx.doi.org/10.1186/s12864-016-2931-8] [PMID: 27506469]
[52]
Zhang S-W, Zhang X-X, Fan X-N, Li WN. LPI-CNNCP: Prediction of lncRNA-protein interactions by using convolutional neural network with the copy-padding trick. Anal Biochem 2020; 601: 113767.
[http://dx.doi.org/10.1016/j.ab.2020.113767] [PMID: 32454029]
[53]
Pancaldi V, Bähler J. In silico characterization and prediction of global protein-mRNA interactions in yeast. Nucleic Acids Res 2011; 39(14): 5826-36.
[http://dx.doi.org/10.1093/nar/gkr160] [PMID: 21459850]
[54]
Hogan DJ, Riordan DP, Gerber AP, Herschlag D, Brown PO. Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system. PLoS Biol 2008; 6(10)e255
[http://dx.doi.org/10.1371/journal.pbio.0060255] [PMID: 18959479]
[55]
Fan X-N, Zhang S-W. LPI-BLS: Predicting lncRNA-protein interactions with a broad learning system-based stacked ensemble classifier. Neurocomputing 2019; 370: 88-93.
[http://dx.doi.org/10.1016/j.neucom.2019.08.084]
[56]
Bai Y, Dai X, Ye T, et al. PlncRNADB: A repository of plant lncrnas and lncrna-rbp protein interactions. CBIO 2019; 14(7): 621-7.
[http://dx.doi.org/10.2174/1574893614666190131161002]
[57]
Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: A web server for clustering and comparing biological sequences. Bioinformatics 2010; 26(5): 680-2.
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844]
[58]
Yi H-C, You Z-H, Cheng L, et al. Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions. Comput Struct Biotechnol J 2019; 18: 20-6.
[http://dx.doi.org/10.1016/j.csbj.2019.11.004] [PMID: 31890140]
[59]
Harrow J, Frankish A, Gonzalez JM, et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res 2012; 22(9): 1760-74.
[http://dx.doi.org/10.1101/gr.135350.111] [PMID: 22955987]
[60]
Cheng Z, Huang K, Wang Y, Liu H, Guan J, Zhou S. Selecting high-quality negative samples for effectively predicting protein-RNA interactions. BMC Syst Biol 2017; 11(S2)(Suppl. 2): 9.
[http://dx.doi.org/10.1186/s12918-017-0390-8] [PMID: 28361676]
[61]
Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981; 147(1): 195-7.
[http://dx.doi.org/10.1016/0022-2836(81)90087-5] [PMID: 7265238]
[62]
Hashemifar S, Neyshabur B, Khan AA, Xu J. Predicting protein-protein interactions through sequence-based deep learning. Bioinformatics 2018; 34(17): i802-10.
[http://dx.doi.org/10.1093/bioinformatics/bty573] [PMID: 30423091]
[63]
Hu H, Zhang L, Ai H, et al. HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy. RNA Biol 2018; 15(6): 797-806.
[http://dx.doi.org/10.1080/15476286.2018.1457935] [PMID: 29583068]
[64]
Bhartiya D, Pal K, Ghosh S, et al. lncRNome: A comprehensive knowledgebase of human long noncoding RNAs. Database 2013. 2013.
[http://dx.doi.org/10.1093/database/bat034] [PMID: 23846593]
[65]
UniProt Consortium. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res 2019; 47(D1): D506-15.
[http://dx.doi.org/10.1093/nar/gky1049] [PMID: 30395287]
[66]
Zhao Y, Li H, Fang S, et al. NONCODE 2016: An informative and valuable data source of long non-coding RNAs. Nucleic Acids Res 2016; 44(D1): D203-8.
[http://dx.doi.org/10.1093/nar/gkv1252] [PMID: 26586799]
[67]
Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou KC. PseKNC-General: A cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 2015; 31(1): 119-20.
[http://dx.doi.org/10.1093/bioinformatics/btu602] [PMID: 25231908]
[68]
Wei L, Liao M, Gao Y, et al. Improved and promising identification of human micrornas by incorporating a high-quality negative set. IEEE/ACM Trans Comput Biol and Bioinf 2014; 11(1): 192-201.
[http://dx.doi.org/10.1109/TCBB.2013.146]
[69]
Dong Q, Zhou S, Guan J. A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 2009; 25(20): 2655-62.
[http://dx.doi.org/10.1093/bioinformatics/btp500] [PMID: 19706744]
[70]
Guo Y, Yu L, Wen Z, Li M. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Res 2008; 36(9): 3025-30.
[http://dx.doi.org/10.1093/nar/gkn159] [PMID: 18390576]
[71]
Liu B, Liu F, Wang X, Chen J, Fang L, Chou KC. Pse-in-One: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 2015; 43(W1)W65-71
[http://dx.doi.org/10.1093/nar/gkv458] [PMID: 25958395]
[72]
Cao D-S, Xu Q-S, Liang Y-Z. propy: A tool to generate various modes of Chou’s PseAAC. Bioinformatics 2013; 29(7): 960-2.
[http://dx.doi.org/10.1093/bioinformatics/btt072] [PMID: 23426256]
[73]
Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion 38. 2010; 11: 3371-408.
[http://dx.doi.org/10.5555/1756006.1953039]
[74]
Le QV. Building high-level features using large scale unsupervised learning. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013.
[http://dx.doi.org/10.1109/ICASSP.2013.6639343]
[75]
Cheng S, Zhang L, Tan J, Gong W, Li C, Zhang X. DM-RPIs: Predicting ncRNA-protein interactions using stacked ensembling strategy. Comput Biol Chem 2019; 83107088
[http://dx.doi.org/10.1016/j.compbiolchem.2019.107088] [PMID: 31330489]
[76]
Peng C, Han S, Zhang H, Li Y. RPITER: A hierarchical deep learning framework for ncrna−protein interaction prediction. Int J Mol Sci 2019; 20(5): 1070.
[http://dx.doi.org/10.3390/ijms20051070] [PMID: 30832218]
[77]
Geourjon C, Deléage G. SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Comput Appl Biosci 1995; 11(6): 681-4.
[http://dx.doi.org/10.1093/bioinformatics/11.6.681] [PMID: 8808585]
[78]
Lorenz R, Bernhart SH, Höner Zu Siederdissen C, et al. ViennaRNA Package 2.0. Algorithms Mol Biol 2011; 6(1): 26.
[http://dx.doi.org/10.1186/1748-7188-6-26] [PMID: 22115189]
[79]
Wekesa JS, Meng J, Luan Y. Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction. Genomics 2020; 112(5): 2928-36.
[http://dx.doi.org/10.1016/j.ygeno.2020.05.005] [PMID: 32437848]
[80]
Ghandi M, Lee D, Mohammad-Noori M, Beer MA. Enhanced regulatory sequence prediction using gapped k-mer features. PLOS Comput Biol 2014; 10(7)e1003711
[http://dx.doi.org/10.1371/journal.pcbi.1003711] [PMID: 25033408]
[81]
Tang G, Shi J, Wu W, Yue X, Zhang W. Sequence-based bacterial small RNAs prediction using ensemble learning strategies. BMC Bioinformatics 2018; 19(S20)(Suppl. 20): 503.
[http://dx.doi.org/10.1186/s12859-018-2535-1] [PMID: 30577759]
[82]
Magnan CN, Baldi P. SSpro/ACCpro 5: Almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 2014; 30(18): 2592-7.
[http://dx.doi.org/10.1093/bioinformatics/btu352] [PMID: 24860169]
[83]
Yan Z, Hamilton WL, Blanchette M. Graph neural representational learning of RNA secondary structures for predicting RNAprotein interactions. Bioinformatics 2020; 36(Supplement_1): i276-84.
[http://dx.doi.org/10.1093/bioinformatics/btaa456] [PMID: 32657407 ]
[84]
Yates AD, Achuthan P, Akanni W, et al. Ensembl 2020. Nucleic Acids Res 2020; 48(D1): D682-8.
[PMID: 31691826]

Rights & Permissions Print Cite
© 2025 Bentham Science Publishers | Privacy Policy