Using the Chou’s Pseudo Component to Predict the ncRNA Locations Based on the Improved K-Nearest Neighbor (iKNN) Classifier

Chengyan       Wu; Qianzhong       Li; Ru      Xing; Guo-Liang      Fan

doi:10.2174/1574893614666191003142406

Abstract

Background: The non-coding RNA identification at the organelle genome level is a challenging task. In our previous work, an ncRNA dataset with less than 80% sequence identity was built, and a method incorporating an increment of diversity combining with support vector machine method was proposed.

Objective: Based on the ncRNA_361 dataset, a novel decision-making method-an improved KNN (iKNN) classifier was proposed.

Methods: In this paper, based on the iKNN algorithm, the physicochemical features of nucleotides, the degeneracy of genetic codons, and topological secondary structure were selected to represent the effective ncRNA characters. Then, the incremental feature selection method was utilized to optimize the feature set.

Results: The results of iKNN indicated that the decision-making method of mean value is distinctly superior to the traditional decision-making method of majority vote the Increment of Diversity Combining Support Vector Machine (ID-SVM). The iKNN algorithm achieved an overall accuracy of 97.368% in the jackknife test, when k=3.

Conclusion: It should be noted that the triplets of the structure-sequence mode under reading frames not only contains the entire sequence information but also reflects whether the base was paired or not, and the secondary structural topological parameters further describe the ncRNA secondary structure on the spatial level. The ncRNA dataset and the iKNN classifier are freely available at http://202.207.14.87:8032/fuwu/iKNN/index.asp.

Keywords: Organelle genome, non-coding RNA, open reading frame, spatial structure, feature selection, K-nearest neighbor method.

« Previous Next »

Graphical Abstract

[1] 
Gutschner T, Diederichs S. The hallmarks of cancer: a long non-coding RNA point of view. RNA Biol  2012; 9(6): 703-19.
[http://dx.doi.org/10.4161/rna.20481] [PMID:  22664915] 
[2] 
Wickelgren I. Molecular biology. Spinning junk into gold. Science  2003; 300(5626): 1646-9.
[http://dx.doi.org/10.1126/science.300.5626.1646] [PMID:  12805516] 
[3] 
Tsai MC, Spitale RC, Chang HY. Long intergenic noncoding RNAs: new links in cancer progression. Cancer Res  2011; 71(1): 3-7.
[http://dx.doi.org/10.1158/0008-5472.CAN-10-2483] [PMID:  21199792] 
[4] 
Leidinger P, Keller A, Backes C, Huwer H, Meese E. MicroRNA expression changes after lung cancer resection: a follow-up study. RNA Biol  2012; 9(6): 900-10.
[http://dx.doi.org/10.4161/rna.20107] [PMID:  22664918] 
[5] 
Diederichs S. Non-coding RNA and disease. RNA Biol  2012; 9(6): 701-2.
[http://dx.doi.org/10.4161/rna.20972] [PMID:  22664913] 
[6] 
Belostotsky R, Frishberg Y, Entelis N. Human mitochondrial tRNA quality control in health and disease: a channelling mechanism? RNA Biol  2012; 9(1): 33-9.
[http://dx.doi.org/10.4161/rna.9.1.18009] [PMID:  22258151] 
[7] 
Zhang T, Tan P, Wang L, et al. RNALocate: a resource for RNA subcellular localizations. Nucleic Acids Res  2017; 45(D1): D135-8.
[PMID:  27543076] 
[8] 
Wu CY, Li QZ, Feng ZX. Non-coding RNA identification based on topology secondary structure and reading frame in organelle genome level. Genomics  2016; 107(1): 9-15.
[http://dx.doi.org/10.1016/j.ygeno.2015.12.002] [PMID:  26697761] 
[9] 
Bu D, Yu K, Sun S, et al. NONCODE v3.0: integrative annotation of long noncoding RNAs. Nucleic Acids Res  2012; 40(Database issue): D210-5.
[http://dx.doi.org/10.1093/nar/gkr1175] [PMID:  22135294] 
[10] 
Shen HB, Chou KC. Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun  2007; 355(4): 1006-11.
[http://dx.doi.org/10.1016/j.bbrc.2007.02.071] [PMID:  17346678] 
[11] 
Chou KC, Shen HB. Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. J Proteome Res  2006; 5(8): 1888-97.
[http://dx.doi.org/10.1021/pr060167c] [PMID:  16889410] 
[12] 
Zuo YC, Su WX, Zhang SH, et al. Discrimination of membrane transporter protein types using K-nearest neighbor method derived from the similarity distance of total diversity measure. Mol Biosyst  2015; 11(3): 950-7.
[http://dx.doi.org/10.1039/C4MB00681J] [PMID:  25607774] 
[13] 
Shen HB, Chou KC. EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun  2007; 364(1): 53-9.
[http://dx.doi.org/10.1016/j.bbrc.2007.09.098] [PMID:  17931599] 
[14] 
Chou KC, Shen HB. A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One  2010; 5(4)e9931
[http://dx.doi.org/10.1371/journal.pone.0009931] [PMID:  20368981] 
[15] 
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics  2006; 22(13): 1658-9.
[http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID:  16731699] 
[16] 
Zhang GY, Fang BS. Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo-amino acid composition. J Theor Biol  2008; 253(2): 310-5.
[http://dx.doi.org/10.1016/j.jtbi.2008.03.015] [PMID:  18471832] 
[17] 
Zhang GY, Li HC, Gao JQ, Fang BS. Predicting lipase types by improved Chou’s pseudo-amino acid composition. Protein Pept Lett  2008; 15(10): 1132-7.
[http://dx.doi.org/10.2174/092986608786071184] [PMID:  19075826] 
[18] 
Chou KC, Shen HB. Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun  2006; 347(1): 150-7.
[http://dx.doi.org/10.1016/j.bbrc.2006.06.059] [PMID:  16808903] 
[19] 
Chou KC, Shen HB. Large-scale plant protein subcellular location prediction. J Cell Biochem  2007; 100(3): 665-78.
[http://dx.doi.org/10.1002/jcb.21096] [PMID:  16983686] 
[20] 
Shen HB, Chou KC. Signal-3L: A 3-layer approach for predicting signal peptides. Biochem Biophys Res Commun  2007; 363(2): 297-303.
[http://dx.doi.org/10.1016/j.bbrc.2007.08.140] [PMID:  17880924] 
[21] 
Chou KC, Shen HB. MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun  2007; 360(2): 339-45.
[http://dx.doi.org/10.1016/j.bbrc.2007.06.027] [PMID:  17586467] 
[22] 
Lan L, Djuric N, Guo Y, et al. MS-k NN: protein function prediction by integrating multiple data sources.BMC bioin-formatics. BioMed Central  2013; 14(3): S8.
[PMID:  23514608] 
[23] 
Dhawan M, Selvaraja S, Duan ZH. Application of committee kNN classifiers for gene expression profile classification. Int J Bioinform Res Appl  2010; 6(4): 344-52.
[http://dx.doi.org/10.1504/IJBRA.2010.035998] [PMID:  20940122] 
[24] 
Ladunga I. More complete gene silencing by fewer siRNAs: transparent optimized design and biophysical signature. Nucleic Acids Res  2007; 35(2): 433-40.
[http://dx.doi.org/10.1093/nar/gkl1065] [PMID:  17169992] 
[25] 
Liu L, Li QZ, Lin H, Zuo YC. The effect of regions flanking target site on siRNA potency. Genomics  2013; 102(4): 215-22.
[http://dx.doi.org/10.1016/j.ygeno.2013.07.009] [PMID:  23891614] 
[26] 
Peek AS. Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features. BMC Bioinformatics  2007; 8(1): 182.
[http://dx.doi.org/10.1186/1471-2105-8-182] [PMID:  17553157] 
[27] 
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics  2007; 23(19): 2507-17.https://academic.oup.com/bioinformatics/article/23/19/2507/185254
[28] 
Jiang P, Wu H, Wang W, et al. MiPred: classification of real and pseudo microRNA precur-sors using random forest prediction model with combined features. Nucleic Acids Res  2007; 35(Suppl. 2): W339-44.
[29] 
Teramoto R, Aoki M, Kimura T, Kanaoka M. Prediction of siRNA functionality using generalized string kernel and support vector machine. FEBS Lett  2005; 579(13): 2878-82.
[http://dx.doi.org/10.1016/j.febslet.2005.04.045] [PMID:  15878553] 
[30] 
Wang Y, Chen X, Jiang W, et al. Predicting human microRNA precursors based on an optimized feature subset generated by GA-SVM. Genomics  2011; 98(2): 73-8.
[http://dx.doi.org/10.1016/j.ygeno.2011.04.011] [PMID:  21586321] 
[31] 
Hofacker IL, Fontana W, Stadler PF, et al. Fast folding and comparison of RNA secondary structur-esMonatshefte für Chemie/Chemical Monthly 1994; 125(2): 167-88.
[32] 
Xue C, Li F, He T, Liu GP, Li Y, Zhang X. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics  2005; 6(1): 310.
[http://dx.doi.org/10.1186/1471-2105-6-310] [PMID:  16381612] 
[33] 
Liu B, Liu F, Fang L, Wang X, Chou KC. repRNA: a web server for generating various feature vectors of RNA sequences. Mol Genet Genomics  2016; 291(1): 473-81.
[http://dx.doi.org/10.1007/s00438-015-1078-7] [PMID:  26085220] 
[34] 
Liu B, Fang L, Liu F, Wang X, Chen J, Chou KC. Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One  2015; 10(3)e0121501
[http://dx.doi.org/10.1371/journal.pone.0121501] [PMID:  25821974] 
[35] 
Liu Z, Xiao X, Qiu WR, Chou KC. iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal Biochem  2015; 474: 69-77.
[http://dx.doi.org/10.1016/j.ab.2014.12.009] [PMID:  25596338] 
[36] 
Chiu JKH, Chen YPP. Pairwise RNA secondary structure alignment with conserved stem pattern. Bioinformatics  2015; 31(24): 3914-21.
[http://dx.doi.org/10.1093/bioinformatics/btv471] [PMID:  26275897] 
[37] 
Xu X, Chen SJ. Physics-based RNA structure prediction. Biophys Rep  2015; 1(1): 2-13.
[http://dx.doi.org/10.1007/s41048-015-0001-4] [PMID:  26942214] 
[38] 
Rahman ME, Islam R, Islam S, Mondal SI, Amin MR. MiRANN: a reliable approach for improved classification of precursor microRNA using artificial neural network model  Genomics  2012; 99(4): 189-94.
[http://dx.doi.org/10.1016/j.ygeno.2012.02.001] [PMID:  22349176] 
[39] 
Ding H, Lin H, Chen W, et al. Prediction of protein structural classes based on feature selection technique. Interdiscip Sci  2014; 6(3): 235-40.
[http://dx.doi.org/10.1007/s12539-013-0205-6] [PMID:  25205501] 
[40] 
Jia P, Qian Z, Feng K, Lu W, Li Y, Cai Y. Prediction of membrane protein types in a hybrid space. J Proteome Res  2008; 7(3): 1131-7.
[http://dx.doi.org/10.1021/pr700715c] [PMID:  18260610] 
[41] 
Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell  2005; 27(8): 1226-8.
[http://dx.doi.org/10.1109/]] 
[42] 
Chou KC, Zhang CT. Prediction of protein structural classes. Crit Rev Biochem Mol Biol  1995; 30(4): 275-349.
[http://dx.doi.org/10.3109/10409239509083488] [PMID:  7587280] 

Rights & Permissions Print Cite

Article Metrics

9

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666191003142406	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Using the Chou’s Pseudo Component to Predict the ncRNA Locations Based on the Improved K-Nearest Neighbor (iKNN) Classifier

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Related Articles

Abstract