ESDA: An Improved Approach to Accurately Identify Human snoRNAs for Precision Cancer Therapy

Yan-mei       Dong; Jia-hao       Bi; Qi-en       He; Kai       Song

doi:10.2174/1574893614666190424162230

Abstract

Background: SnoRNAs (Small nucleolar RNAs) are small RNA molecules with approximately 60-300 nucleotides in sequence length. They have been proved to play important roles in cancer occurrence and progression. It is of great clinical importance to identify new snoRNAs as fast and accurately as possible.

Objective: A novel algorithm, ESDA (Elastically Sparse Partial Least Squares Discriminant Analysis), was proposed to improve the speed and the performance of recognizing snoRNAs from other RNAs in human genomes.

Methods: In ESDA algorithm, to optimize the extracted information, kernel features were selected from the variables extracted from both primary sequences and secondary structures. Then they were used by SPLSDA (sparse partial least squares discriminant analysis) algorithm as input variables for the final classification model training to distinguish snoRNA sequences from other Human RNAs. Due to the fact that no prior biological knowledge is request to optimize the classification model, ESDA is a very practical method especially for completely new sequences.

Results: 89 H/ACA snoRNAs and 269 C/D snoRNAs of human were used as positive samples and 3403 non-snoRNAs as negative samples to test the identification performance of the proposed ESDA. For the H/ACA snoRNAs identification, the sensitivity and specificity were respectively as high as 99.6% and 98.8%. For C/D snoRNAs, they were respectively 96.1% and 98.3%. Furthermore, we compared ESDA with other widely used algorithms and classifiers: SnoReport, RF (Random Forest), DWD (Distance Weighted Discrimination) and SVM (Support Vector Machine). The highest improvement of accuracy obtained by ESDA was 25.1%.

Conclusion: Strongly proved the superiority performance of ESDA and make it promising for identifying SnoRNAs for further development of the precision medicine for cancers.

Keywords: Human snoRNA, elastic net algorithm, sparse partial least squares discriminant analysis, identification, algorithm, development, cancer.

« Previous Next »

Graphical Abstract

[1] 
Maden BEH, Hughes JM. Eukaryotic ribosomal RNA: the recent excitement in the nucleotide modification problem. Chromosoma  1997; 105(7-8): 391-400.
[http://dx.doi.org/10.1007/BF02510475] [PMID:  9211966] 
[2] 
Bachellerie J-P, Cavaillé J, Hüttenhofer A. The expanding snoRNA world. Biochimie  2002; 84(8): 775-90.
[http://dx.doi.org/10.1016/S0300-9084(02)01402-5] [PMID:  12457565] 
[3] 
Schwartz PhilipH. Neural stem cells in health and disease [J]. Methods  2018; 133: 1-2.
[http://dx.doi.org/10.1016/S0006-291X(02)02623-2] [PMID:  12437969] 
[4] 
Yang X, Li Y, Li L, Liu J, Wu M, Ye M. SnoRNAs are involved in the progression of ulcerative colitis and colorectal cancer. Dig Liver Dis  2017; 49(5): 545-51.
[http://dx.doi.org/10.1016/j.dld.2016.12.029] [PMID:  28110922] 
[5] 
Dong X-Y, Rodriguez C, Guo P, et al. SnoRNA U50 is a candidate tumor-suppressor gene at 6q14.3 with a mutation associated with clinically significant prostate cancer. Hum Mol Genet  2008; 17(7): 1031-42.
[http://dx.doi.org/10.1093/hmg/ddm375] [PMID:  18202102] 
[6] 
Dong X-Y, Guo P, Boyd J, et al. Implication of snoRNA U50 in human breast cancer. J Genet Genomics  2009; 36(8): 447-54.
[http://dx.doi.org/10.1016/S1673-8527(08)60134-4] [PMID:  19683667] 
[7] 
Liao J, Yu L, Mei Y, et al. Small nucleolar RNA signatures as biomarkers for non-small-cell lung cancer. Mol Cancer  2010; 9: 198.
[http://dx.doi.org/10.1186/1476-4598-9-198] [PMID:  20663213] 
[8] 
Wang PP-S, Ruvinsky I. Computational prediction of Caenorhabditis box H/ACA snoRNAs using genomic properties of their host genes. RNA  2010; 16(2): 290-8.
[http://dx.doi.org/10.1261/rna.1876210] [PMID:  20038629] 
[9] 
Schattner P, Barberan-Soler S, Lowe TM. A computational screen for mammalian pseudouridylation guide H/ACA RNAs. RNA  2006; 12(1): 15-25.
[http://dx.doi.org/10.1261/rna.2210406] [PMID:  16373490] 
[10] 
Lowe TM, Eddy SR. A computational screen for methylation guide snoRNAs in yeast. Science  1999; 283(5405): 1168-71.
[http://dx.doi.org/10.1126/science.283.5405.1168] [PMID:  10024243] 
[11] 
Hertel J, Hofacker IL, Stadler PF. SnoReport: computational identification of snoRNAs with unknown targets. Bioinformatics  2008; 24(2): 158-64.
[http://dx.doi.org/10.1093/bioinformatics/btm464] [PMID:  17895272] 
[12] 
Yang J-H, Zhang X-C, Huang Z-P, et al. snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res  2006; 34(18): 5112-23.
[http://dx.doi.org/10.1093/nar/gkl672] [PMID:  16990247] 
[13] 
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol  2005; 67: 301-20.
[http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x] 
[14] 
Qiu ZW, Bi JH, Gazdar AF, Song K. Genome-wide copy number variation pattern analysis and a classification signature for nonsmall cell lung cancer Genes Chromosomes Cancer 4041 2017; 56: 559-69.
[http://dx.doi.org/10.1002/gcc.22460] 
[15] 
Chen S, Zhang CY, Song K. Recognizing short coding sequences of prokaryotic genome using a novel iteratively adaptive sparse partial least squares algorithm. Biol Direct  2013; 8: 23.
[http://dx.doi.org/10.1186/1745-6150-8-23] [PMID:  24067167] 
[16] 
Lê Cao KA, Boitard S, Besse P. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics  2011; 12: 253.
[http://dx.doi.org/10.1186/1471-2105-12-253] [PMID:  21693065] 
[17] 
Lestrade L, Weber MJ. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res  2006; 34(Database issue): D158-62.
[http://dx.doi.org/10.1093/nar/gkj002] [PMID:  16381836] 
[18] 
Zhang R, Zhang CT. Z curves, an intutive tool for visualizing and analyzing the DNA sequences. J Biomol Struct Dyn  1994; 11(4): 767-82.
[http://dx.doi.org/10.1080/07391102.1994.10508031] [PMID:  8204213] 
[19] 
Zhang CT, Wang J. Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res  2000; 28(14): 2804-14.
[http://dx.doi.org/10.1093/nar/28.14.2804] [PMID:  10908339] 
[20] 
Gao F, Zhang CT. Comparison of various algorithms for recognizing short coding sequences of human genes. Bioinformatics  2004; 20(5): 673-81.
[http://dx.doi.org/10.1093/bioinformatics/btg467] [PMID:  14764563] 
[21] 
Song K, Zhang Z, Tong TP, Wu F. Classifier assessment and feature selection for recognizing short coding sequences of human genes. J Comput Biol  2012; 19(3): 251-60.
[http://dx.doi.org/10.1089/cmb.2011.0078] [PMID:  22401589] 
[22] 
Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res  2003; 31(13): 3429-31.
[http://dx.doi.org/10.1093/nar/gkg599] [PMID:  12824340] 
[23] 
Xue C, Li F, He T, Liu G-P, Li Y, Zhang X. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics  2005; 6: 310.
[http://dx.doi.org/10.1186/1471-2105-6-310] [PMID:  16381612] 
[24] 
Jiang P, Wu H, Wang W, Ma W, Sun X, Lu Z. MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features Nucleic Acids Res 2007; 35(Web Server issue): W339-44.
[http://dx.doi.org/10.1093/nar/gkm368] [PMID:  17553836] 
[25] 
Marron J, Todd MJ, Ahn J. Distance-weighted discrimination. J Am Stat Assoc  2007; 102: 1267-71.
[http://dx.doi.org/10.1198/016214507000001120] 
[26] 
Breiman L. Random forests. Mach Learn  2001; 45: 5-32.
[http://dx.doi.org/10.1023/A:1010933404324] 

Rights & Permissions Print Cite

Article Metrics

5

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666190424162230	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

ESDA: An Improved Approach to Accurately Identify Human snoRNAs for Precision Cancer Therapy

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Related Articles

Abstract