Application of Machine Learning Methods in Predicting Nuclear Receptors and their Families

Zi-Mei       Zhang; Zheng-Xing       Guan; Fang       Wang; Dan       Zhang; Hui       Ding
doi:10.2174/1573406415666191004125551
Abstract

Nuclear receptors (NRs) are a superfamily of ligand-dependent transcription factors that are closely related to cell development, differentiation, reproduction, homeostasis, and metabolism. According to the alignments of the conserved domains, NRs are classified and assigned the following seven subfamilies or eight subfamilies: (1) NR1: thyroid hormone like (thyroid hormone, retinoic acid, RAR-related orphan receptor, peroxisome proliferator activated, vitamin D3- like), (2) NR2: HNF4-like (hepatocyte nuclear factor 4, retinoic acid X, tailless-like, COUP-TFlike, USP), (3) NR3: estrogen-like (estrogen, estrogen-related, glucocorticoid-like), (4) NR4: nerve growth factor IB-like (NGFI-B-like), (5) NR5: fushi tarazu-F1 like (fushi tarazu-F1 like), (6) NR6: germ cell nuclear factor like (germ cell nuclear factor), and (7) NR0: knirps like (knirps, knirpsrelated, embryonic gonad protein, ODR7, trithorax) and DAX like (DAX, SHP), or dividing NR0 into (7) NR7: knirps like and (8) NR8: DAX like. Different NRs families have different structural features and functions. Since the function of a NR is closely correlated with which subfamily it belongs to, it is highly desirable to identify NRs and their subfamilies rapidly and effectively. The knowledge acquired is essential for a proper understanding of normal and abnormal cellular mechanisms. With the advent of the post-genomics era, huge amounts of sequence-known proteins have increased explosively. Conventional methods for accurately classifying the family of NRs are experimental means with high cost and low efficiency. Therefore, it has created a greater need for bioinformatics tools to effectively recognize NRs and their subfamilies for the purpose of understanding their biological function. In this review, we summarized the application of machine learning methods in the prediction of NRs from different aspects. We hope that this review will provide a reference for further research on the classification of NRs and their families.
Keywords: Nuclear receptors (NRs), NRs families, prediction, classification, machine learning methods, feature selection.
« Previous Next »
Graphical Abstract

[1] 
Liu, K.; Zou, C.; Qin, B. The association between nuclear receptors and ocular diseases. Oncotarget,  2017, 8(16), 27603-27615.
[http://dx.doi.org/10.18632/oncotarget.15178] [PMID: 28187442] 
[2] 
Wang, H.; Hu, X. Accurate prediction of nuclear receptors with conjoint triad feature. BMC Bioinformatics,  2015, 16, 402.
[http://dx.doi.org/10.1186/s12859-015-0828-1] [PMID: 26630876] 
[3] 
Gao, Q.B.; Jin, Z.C.; Ye, X.F.; Wu, C.; He, J. Prediction of nuclear receptors with optimal pseudo amino acid composition. Anal. Biochem.,  2009, 387(1), 54-59.
[http://dx.doi.org/10.1016/j.ab.2009.01.018] [PMID: 19454254] 
[4] 
Altucci, L.; Gronemeyer, H. Nuclear receptors in cell life and death. Trends Endocrinol. Metab.,  2001, 12(10), 460-468.
[http://dx.doi.org/10.1016/S1043-2760(01)00502-1] [PMID: 11701345] 
[5] 
Mangelsdorf, D.J.; Thummel, C.; Beato, M.; Herrlich, P.; Schütz, G.; Umesono, K.; Blumberg, B.; Kastner, P.; Mark, M.; Chambon, P.; Evans, R.M. The nuclear receptor superfamily: the second decade. Cell,  1995, 83(6), 835-839.
[http://dx.doi.org/10.1016/0092-8674(95)90199-X]] [PMID: 8521507] 
[6] 
Gronemeyer, H.; Laudet, V. Transcription factors 3: nuclear receptors. Protein Profile,  1995, 2(11), 1173-1308.
[PMID: 8681033] 
[7] 
Lazar, M.A. Maturing of the nuclear receptor family. J. Clin. Invest.,  2017, 127(4), 1123-1125.
[http://dx.doi.org/10.1172/JCI92949] [PMID: 28368290] 
[8] 
Cheng, L.; Zhuang, H.; Yang, S.; Jiang, H.; Wang, S.; Zhang, J. Exposing the causal effect of c-reactive protein on the risk of type 2 diabetes mellitus: A mendelian randomization study. Front. Genet.,  2018, 9, 657.
[http://dx.doi.org/10.3389/fgene.2018.00657] [PMID: 30619477] 
[9] 
Cheng, L.; Wang, P.; Tian, R.; Wang, S.; Guo, Q.; Luo, M.; Zhou, W.; Liu, G.; Jiang, H.; Jiang, Q. LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res.,  2019, 47(D1), D140-D144.
[http://dx.doi.org/10.1093/nar/gky1051] [PMID: 30380072] 
[10] 
Bhasin, M.; Raghava, G.P. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J. Biol. Chem.,  2004, 279(22), 23262-23266.
[http://dx.doi.org/10.1074/jbc.M401932200] [PMID: 15039428] 
[11] 
Horn, F.; Vriend, G.; Cohen, F.E. Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems. Nucleic Acids Res.,  2001, 29(1), 346-349.
[http://dx.doi.org/10.1093/nar/29.1.346] [PMID: 11125133] 
[12] 
Robinson-Rechavi, M.; Escriva Garcia, H.; Laudet, V. The nuclear receptor superfamily. J. Cell Sci.,  2003, 116(Pt 4), 585-586.
[http://dx.doi.org/10.1242/jcs.00247] [PMID: 12538758] 
[13] 
Nuclear Receptors Nomenclature, C. A unified nomenclature system for the nuclear receptor superfamily. Cell,  1999, 97(2), 161-163.
[http://dx.doi.org/10.1016/S0092-8674(00)80726-6] [PMID: 10219237] 
[14] 
Laudet, V. Evolution of the nuclear receptor superfamily: early diversification from an ancestral orphan receptor. J. Mol. Endocrinol.,  1997, 19(3), 207-226.
[http://dx.doi.org/10.1677/jme.0.0190207]] [PMID: 9460643] 
[15] 
Wang, P.; Xiao, X.; Chou, K.C. NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features. PLoS One,  2011, 6(8) e23505
[http://dx.doi.org/10.1371/journal.pone.0023505] [PMID: 21858146] 
[16] 
Bhasin, M.; Raghava, G.P. ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res.,  2004, 34, W414-W419.
[17] 
Xiao, X.; Wang, P.; Chou, K.C. iNR-PhysChem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix. PLoS One,  2012, 7(2) e30869
[http://dx.doi.org/10.1371/journal.pone.0030869] [PMID: 22363503] 
[18] 
Kumar, R.; Kumari, B.; Srivastava, A.; Kumar, M. NRfamPred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families. Sci. Rep.,  2014, 4, 6810.
[http://dx.doi.org/10.1038/srep06810] [PMID: 25351274] 
[19] 
Kabir, M.; Ahmad, S.; Iqbal, M.; Hayat, M. iNR-2L: A two-level sequence-based predictor developed via Chou’s 5-steps rule and general PseAAC for identifying nuclear receptors and their families.  Genomics,   2019,  S0888-7543(18), 30694-3.
[http://dx.doi.org/10.1016/j.ygeno.2019.02.006] [PMID: 30779939] 
[20] 
Vroling, B.; Thorne, D.; McDermott, P.; Joosten, H.J.; Attwood, T.K.; Pettifer, S.; Vriend, G. NucleaRDB: information system for nuclear receptors. Nucleic Acids Res.,  2012, 40(Database issue), D377-D380.
[http://dx.doi.org/10.1093/nar/gkr960] [PMID: 22064856] 
[21] 
Horn, F.; Lau, A.L.; Cohen, F.E. Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics,  2004, 20(4), 557-568.
[http://dx.doi.org/10.1093/bioinformatics/btg449] [PMID: 14990452] 
[22] 
Bettler, E.; Krause, R.; Horn, F.; Vriend, G. NRSAS: Nuclear Receptor Structure Analysis Servers. Nucleic Acids Res.,  2003, 31(13), 3400-3403.
[http://dx.doi.org/10.1093/nar/gkg505] [PMID: 12824335] 
[23] 
Schuffenhauer, A.; Zimmermann, J.; Stoop, R.; van der Vyver, J.J.; Lecchini, S.; Jacoby, E. An ontology for pharmaceutical ligands and its application for in silico screening and library design. J. Chem. Inf. Comput. Sci.,  2002, 42(4), 947-955.
[http://dx.doi.org/10.1021/ci010385k] [PMID: 12132896] 
[24] 
Ma, X. Investigation of antineutrino spectral anomaly with updated nuclear database, 2018.
[25] 
UniProt: a hub for protein information. Nucleic Acids Res.,  2015, 43(Database issue), D204-D212.
[PMID: 25348405] 
[26] 
Pundir, S.; Martin, M.J.; O'Donovan, C. UniProt Tools  Curr. Protoc. Bioinformatics,  2016,  53,  1.29.1-1.29.15.
[http://dx.doi.org/10.1002/0471250953.bi0129s53] 
[27] 
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res.,  2017, 45(D1), D158-D169.
[http://dx.doi.org/10.1093/nar/gkw1099] [PMID: 27899622] 
[28] 
Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics,  2010, 26(5), 680-682.
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844] 
[29] 
Li, W.; Jaroszewski, L.; Godzik, A. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics,  2002, 18(1), 77-82.
[http://dx.doi.org/10.1093/bioinformatics/18.1.77] [PMID: 11836214] 
[30] 
Li, W.; Jaroszewski, L.; Godzik, A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics,  2001, 17(3), 282-283.
[http://dx.doi.org/10.1093/bioinformatics/17.3.282] [PMID: 11294794] 
[31] 
Li, W.; Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics,  2006, 22(13), 1658-1659.
[http://dx.doi.org/10.1093/bioinformatics/btl158 ] [PMID: 16731699] 
[32] 
Zou, Q. Sequence clustering in bioinformatics: an empirical study. Brief. Bioinform., 2019.
[http://dx.doi.org/10.1093/bib/bby090] 
[33] 
Brendel, V. PROSET-a fast procedure to create non-redundant sets of protein sequences. Math. Comput. Model.,  1992, 16(6-7), 37-43.
[http://dx.doi.org/10.1016/0895-7177(92)90150-J] 
[34] 
Liu, D.; Li, G.; Zuo, Y. Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Brief. Bioinform., 2018.
[http://dx.doi.org/10.1093/bib/bby053] [PMID: 29947743] 
[35] 
Cao, R.; Freitas, C.; Chan, L.; Sun, M.; Jiang, H.; Chen, Z. Pro- LanGO: Protein function prediction using neural machine translation based on a recurrent neural network. Molecules,  2017, 22(10) E1732
[http://dx.doi.org/10.3390/molecules22101732] [PMID: 29039790] 
[36] 
Chou, K.C.; Zhang, C.T. Predicting protein folding types by distance functions that make allowances for amino acid interactions. J. Biol. Chem.,  1994, 269(35), 22014-22020.
[PMID: 8071322] 
[37] 
Washio, J.; Ogawa, T.; Suzuki, K.; Tsukiboshi, Y.; Watanabe, M.; Takahashi, N. Amino acid composition and amino acid-metabolic network in supragingival plaque. Biomed. Res.,  2016, 37(4), 251-257.
[http://dx.doi.org/10.2220/biomedres.37.251] [PMID: 27545001] 
[38] 
Cao, R.; Bhattacharya, D.; Hou, J.; Cheng, J. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics,  2016, 17(1), 495.
[http://dx.doi.org/10.1186/s12859-016-1405-y] [PMID: 27919220] 
[39] 
Cao, R.; Cheng, J. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods,  2016, 93, 84-91.
[http://dx.doi.org/10.1016/j.ymeth.2015.09.011] [PMID: 26370280] 
[40] 
Waris, M. Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix. Neurocomputing,  2016, 199, 154-162.
[http://dx.doi.org/10.1016/j.neucom.2016.03.025] 
[41] 
Pan, Y.; Wang, S.; Zhang, Q.; Lu, Q.; Su, D.; Zuo, Y.; Yang, L. Analysis and prediction of animal toxins by various Chou’s pseudo components and reduced amino acid compositions. J. Theor. Biol.,  2019, 462, 221-229.
[http://dx.doi.org/10.1016/j.jtbi.2018.11.010] [PMID: 30452961] 
[42] 
Basith, S.; Manavalan, B.; Shin, T.H.; Lee, G. iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree. Comput. Struct. Biotechnol. J.,  2018, 16, 412-420.
[http://dx.doi.org/10.1016/j.csbj.2018.10.007] [PMID: 30425802] 
[43] 
Manavalan, B.; Govindaraj, R.G.; Shin, T.H.; Kim, M.O.; Lee, G. iBCE-EL: A new ensemble learning framework for improved linear B-Cell epitope prediction. Front. Immunol.,  2018, 9, 1695.
[http://dx.doi.org/10.3389/fimmu.2018.01695] [PMID: 30100904] 
[44] 
Manavalan, B.; Shin, T.H.; Kim, M.O.; Lee, G. PIP-EL: A new ensemble learning method for improved proinflammatory peptide predictions. Front. Immunol.,  2018, 9, 1783.
[http://dx.doi.org/10.3389/fimmu.2018.01783] [PMID: 30108593] 
[45] 
Hayat, M.; Khan, A. Prediction of membrane protein types by using dipeptide and pseudo amino acid composition-based composite features. IET Commun.,  2012, 6(18), 3257-3264.
[http://dx.doi.org/10.1049/iet-com.2011.0170] 
[46] 
Ding, H.; Deng, E.Z.; Yuan, L.F.; Liu, L.; Lin, H.; Chen, W.; Chou, K.C. iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res. Int.,  2014, 2014, 286419
[http://dx.doi.org/10.1155/2014/286419] [PMID: 24991545] 
[47] 
Lai, H.Y.; Chen, X.X.; Chen, W.; Tang, H.; Lin, H. Sequence-based predictive modeling to identify cancerlectins. Oncotarget,  2017, 8(17), 28169-28175.
[http://dx.doi.org/10.18632/oncotarget.15963] [PMID: 28423655] 
[48] 
Lin, H.; Chen, W.; Yuan, L.F.; Li, Z.Q.; Ding, H. Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheor.,  2013, 61(2), 259-268.
[http://dx.doi.org/10.1007/s10441-013-9181-9] [PMID: 23475502] 
[49] 
Zhu, P.P.; Li, W.C.; Zhong, Z.J.; Deng, E.Z.; Ding, H.; Chen, W.; Lin, H. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol. Biosyst.,  2015, 11(2), 558-563.
[http://dx.doi.org/10.1039/C4MB00645C] [PMID: 25437899] 
[50] 
Ding, C.; Yuan, L.F.; Guo, S.H.; Lin, H.; Chen, W. Identification of mycobacterial membrane proteins and their types using over represented tripeptide compositions. J. Proteomics,  2012, 77, 321-328.
[http://dx.doi.org/10.1016/j.jprot.2012.09.006] [PMID: 23000219] 
[51] 
Liu, W.X.; Deng, E.Z.; Chen, W.; Lin, H. Identifying the subfamilies of voltage-gated potassium channels using feature selection technique. Int. J. Mol. Sci.,  2014, 15(7), 12940-12951.
[http://dx.doi.org/10.3390/ijms150712940] [PMID: 25054318] 
[52] 
Nanni, L.; Lumini, A. Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids,  2008, 34(4), 653-660.
[http://dx.doi.org/10.1007/s00726-007-0018-1] [PMID: 18175047] 
[53] 
Nanni, L. Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans. Comput. Biol. Bioinform,  2012, 9(2), 467-75.
[http://dx.doi.org/10.1109/TCBB.2011.117] 
[54] 
Qiu, J.D.; Huang, J.H.; Liang, R.P.; Lu, X.Q. Prediction of G protein-coupled receptor classes based on the concept of Chou’s pseudo amino acid composition: an approach from discrete wavelet transform. Anal. Biochem.,  2009, 390(1), 68-73.
[http://dx.doi.org/10.1016/j.ab.2009.04.009] [PMID: 19364489] 
[55] 
Mohabatkar, H.; Beigi, M.M.; Abdolahi, K.; Mohsenzadeh, S. Prediction of allergenic proteins by means of the concept of Chou’s pseudo amino acid composition and a machine learning approach. Med. Chem.,  2013, 9(1), 133-137.
[http://dx.doi.org/10.2174/157340613804488341] [PMID: 22931491] 
[56] 
Ding, H.; Yang, W.; Tang, H.; Feng, P.M.; Huang, J.; Chen, W.; Lin, H. PHYPred: a tool for identifying bacteriophage enzymes and hydrolases. Virol. Sin.,  2016, 31(4), 350-352.
[http://dx.doi.org/10.1007/s12250-016-3740-6] [PMID: 27151186] 
[57] 
Yang, W. A brief survey of machine learning methods in protein sub-Golgi localization. Curr. Bioinform.,  2019, 14, 234-240.
[http://dx.doi.org/10.2174/1574893613666181113131415] 
[58] 
Zuo, Y.; Li, Y.; Chen, Y.; Li, G.; Yan, Z.; Yang, L. PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics,  2017, 33(1), 122-124.
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583] 
[59] 
Chou, K.C. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins,  2001, 43(3), 246-255.
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174] 
[60] 
Xiao, X.; Shao, S.H.; Huang, Z.D.; Chou, K.C. Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor. J. Comput. Chem.,  2006, 27(4), 478-482.
[http://dx.doi.org/10.1002/jcc.20354] [PMID: 16429410] 
[61] 
Gusev, V.D.; Nemytikova, L.A.; Chuzhanova, N.A. On the complexity measures of genetic sequences. Bioinformatics,  1999, 15(12), 994-999.
[http://dx.doi.org/10.1093/bioinformatics/15.12.994] [PMID: 10745989] 
[62] 
Shen, J.; Zhang, J.; Luo, X.; Zhu, W.; Yu, K.; Chen, K.; Li, Y.; Jiang, H. Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA,  2007, 104(11), 4337-4341.
[http://dx.doi.org/10.1073/pnas.0607879104] [PMID: 17360525] 
[63] 
Basu, S.; Pan, A.; Dutta, C.; Das, J. Chaos game representation of proteins. J. Mol. Graph. Model.,  1997, 15(5), 279-289.
[http://dx.doi.org/10.1016/S1093-3263(97)00106-X]] [PMID: 9640559] 
[64] 
Jeffrey, H.J. Chaos game representation of gene structure. Nucleic Acids Res.,  1990, 18(8), 2163-2170.
[http://dx.doi.org/10.1093/nar/18.8.2163]] [PMID: 2336393] 
[65] 
Yang, J.Y.; Peng, Z.L.; Yu, Z.G.; Zhang, R.J.; Anh, V.; Wang, D. Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. J. Theor. Biol.,  2009, 257(4), 618-626.
[http://dx.doi.org/10.1016/j.jtbi.2008.12.027] [PMID: 19183559] 
[66] 
Lu, J.L.; Hu, X.H.; Hu, D.G. A new hybrid fractal algorithm for predicting thermophilic nucleotide sequences. J. Theor. Biol.,  2012, 293, 74-81.
[http://dx.doi.org/10.1016/j.jtbi.2011.09.028] [PMID: 22001320] 
[67] 
Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol.,  2005, 3(2), 185-205.
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500] 
[68] 
Naseem, I. ECMSRC: A Sparse Learning Approach for the Prediction of Extracellular Matrix Proteins. Curr. Bioinform.,  2017, 12(4), 361-368.
[http://dx.doi.org/10.2174/1574893611666151215213508] 
[69] 
Cai, Y.; Huang, T.; Hu, L.; Shi, X.; Xie, L.; Li, Y. Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids,  2012, 42(4), 1387-1395.
[http://dx.doi.org/10.1007/s00726-011-0835-0] [PMID: 21267749] 
[70] 
Zou, Q. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing,  2016, 173, 346-354.
[http://dx.doi.org/10.1016/j.neucom.2014.12.123] 
[71] 
Zhu, Y.; Shen, X.; Pan, W. Network-based support vector machine for classification of microarray samples. BMC Bioinformatics,  2009, 10(Suppl. 1), S21.
[http://dx.doi.org/10.1186/1471-2105-10-S1-S21] [PMID: 19208121] 
[72] 
O’Fallon, B.D.; Wooderchak-Donahue, W.; Crockett, D.K. A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data. Bioinformatics,  2013, 29(11), 1361-1366.
[http://dx.doi.org/10.1093/bioinformatics/btt172] [PMID: 23620357] 
[73] 
Furey, T.S.; Cristianini, N.; Duffy, N.; Bednarski, D.W.; Schummer, M.; Haussler, D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics,  2000, 16(10), 906-914.
[http://dx.doi.org/10.1093/bioinformatics/16.10.906] [PMID: 11120680] 
[74] 
Li, T.; Li, Q.Z.; Liu, S.; Fan, G.L.; Zuo, Y.C.; Peng, Y. PreDNA: accurate prediction of DNA-binding sites in proteins by integrating sequence and geometric structure information. Bioinformatics,  2013, 29(6), 678-685.
[http://dx.doi.org/10.1093/bioinformatics/btt029] [PMID: 23335013] 
[75] 
Fletez-Brant, C. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.  Nucleic Acids Res.,,   2013,  41(Web Server issue),  W544-56.
[76] 
Kumar, M.; Gromiha, M.M.; Raghava, G.P. Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins,  2008, 71(1), 189-194.
[http://dx.doi.org/10.1002/prot.21677] [PMID: 17932917] 
[77] 
Liu, Y.; Guo, J.; Hu, G.; Zhu, H. Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformatics,  2013, 14(Suppl. 5), S12.
[http://dx.doi.org/10.1186/1471-2105-14-S5-S12] [PMID: 23735199] 
[78] 
Ramana, J.; Gupta, D. LipocalinPred: a SVM-based method for prediction of lipocalins. BMC Bioinformatics,  2009, 10, 445.
[http://dx.doi.org/10.1186/1471-2105-10-445] [PMID: 20030857] 
[79] 
Huang, W.L.; Tung, C.W.; Huang, H.L.; Hwang, S.F.; Ho, S.Y. ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. Biosystems,  2007, 90(2), 573-581.
[http://dx.doi.org/10.1016/j.biosystems.2007.01.001] [PMID: 17291684] 
[80] 
Bu, H.D. Predicting Enhancers from multiple cell lines and tissues across different developmental stages based on svm method. Curr. Bioinform.,  2018, 13(6), 655-660.
[http://dx.doi.org/10.2174/1574893613666180726163429] 
[81] 
Li, D.; Ju, Y.; Zou, Q. Protein folds prediction with hierarchical structured SVM. Curr. Proteomics,  2016, 13(2), 79-85.
[http://dx.doi.org/10.2174/157016461302160514000940] 
[82] 
Chen, W.; Lv, H.; Nie, F.; Lin, H. i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics,  2019, 35(16), 2796-2800.
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619] 
[83] 
Zuo, Y.; Lv, Y.; Wei, Z.; Yang, L.; Li, G.; Fan, G. iDPF-PseRAAAC: A Web-Server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS One,  2015, 10(12) e0145541
[http://dx.doi.org/10.1371/journal.pone.0145541] [PMID: 26713618] 
[84] 
Tang, H. A two-step discriminated method to identify thermophilic proteins.  Int. J. Biomath.,,   2017,  10(4)  10, 1750050.
[http://dx.doi.org/10.1142/S1793524517500504] 
[85] 
Cao, R.; Wang, Z.; Wang, Y.; Cheng, J. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics,  2014, 15, 120.
[http://dx.doi.org/10.1186/1471-2105-15-120] [PMID: 24776231] 
[86] 
Manavalan, B.; Shin, T.H.; Lee, G. PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine. Front. Microbiol.,  2018, 9, 476.
[http://dx.doi.org/10.3389/fmicb.2018.00476] [PMID: 29616000] 
[87] 
Boopathi, V.; Subramaniyam, S.; Malik, A.; Lee, G.; Manavalan, B.; Yang, D.C. mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides. Int. J. Mol. Sci.,  2019, 20(8) E1964
[http://dx.doi.org/10.3390/ijms20081964] [PMID: 31013619] 
[88] 
Manavalan, B. Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA N4-methylcytosine Site Prediction Using Effective Feature Representation. Mol. Ther. Nucleic Acids,  2019, 16, 733-744.
[http://dx.doi.org/10.1016/j.omtn.2019.04.019] 
[89] 
Wei, L.; Su, R.; Luan, S.; Liao, Z.; Manavalan, B.; Zou, Q.; Shi, X. Iterative feature representations improve N4-methylcytosine site prediction. Bioinformatics,  2019, 35(23), 4930-4937.
[http://dx.doi.org/10.1093/bioinformatics/btz408] [PMID: 31099381] 
[90] 
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory,  1967, 13(1), 21-27.
[http://dx.doi.org/10.1109/TIT.1967.1053964] 
[91] 
Zhang, Z. Introduction to machine learning: k-nearest neighbors. Ann. Transl. Med.,  2016, 4(11), 218.
[http://dx.doi.org/10.21037/atm.2016.03.37] [PMID: 27386492] 
[92] 
Ning, Q.; Ma, Z.; Zhao, X. dForml(KNN)-PseAAC: Detecting formylation sites from protein sequences using K-nearest neighbor algorithm via Chou’s 5-step rule and pseudo components. J. Theor. Biol.,  2019, 470, 43-49.
[http://dx.doi.org/10.1016/j.jtbi.2019.03.011] [PMID: 30880183] 
[93] 
Cardoso, I. Analysis of machine learning algorithms for diagnosis of diffuse lung diseases. Methods Inf. Med.,,   2018, 57(5-06), 272-279.
[94] 
Youmans, M.; Spainhour, J.C.G.; Qiu, P. Classification of antibacterial peptides using long short-term memory recurrent neural networks. IEEE/ACM Trans. Comput. Biol. Bioinform., 2019.
[http://dx.doi.org/10.1109/TCBB.2019.2903800] 
[95] 
Palmer, K.A.; Bollas, G.M. Active fault diagnosis for uncertain systems using optimal test designs and detection through classification. ISA Trans.,,   2019, S0019-0578(19), 30115-6.
[http://dx.doi.org/10.1016/j.isatra.2019.02.034] [PMID: 30850204] 
[96] 
Khan, Z.U.; Hayat, M.; Khan, M.A. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J. Theor. Biol.,  2015, 365, 197-203.
[http://dx.doi.org/10.1016/j.jtbi.2014.10.014] [PMID: 25452135] 
[97] 
Hayat, M.; Khan, A. Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types. Anal. Biochem.,  2012, 424(1), 35-44.
[http://dx.doi.org/10.1016/j.ab.2012.02.007] [PMID: 22342883] 
[98] 
Miarka, B.; Sterkowicz-Przybycien, K.; Fukuda, D.H. Evaluation of Sex-Specific Movement Patterns in Judo Using Probabilistic Neural Networks. Mot. Contr.,  2017, 21(4), 390-412.
[http://dx.doi.org/10.1123/mc.2016-0007] [PMID: 27736312] 
[99] 
Liao, X.; Li, B.; Yang, B. A Novel Classification and identification scheme of emitter signals based on ward’s clustering and probabilistic neural networks with correlation analysis. Comput. Intell. Neurosci.,  2018, 2018 1458962
[http://dx.doi.org/10.1155/2018/1458962] [PMID: 30532768] 
[100] 
Specht, D.F. Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification. IEEE Trans. Neural Netw.,  1990, 1(1), 111-121.
[http://dx.doi.org/10.1109/72.80210] [PMID: 18282828] 
[101] 
Chou, K.C.; Shen, H.B. Recent progress in protein subcellular location prediction. Anal. Biochem.,  2007, 370(1), 1-16.
[http://dx.doi.org/10.1016/j.ab.2007.07.006] [PMID: 17698024] 
[102] 
Chou, K.C.; Zhang, C.T. Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol.,  1995, 30(4), 275-349.
[http://dx.doi.org/10.3109/10409239509083488]] [PMID: 7587280] 
[103] 
Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res. Int.,  2016, 2016, 5413903
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968] 
[104] 
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst.,  2016, 12(4), 1269-1275.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492] 
[105] 
Chen, X.X.; Tang, H.; Li, W.C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of bacterial cell wall lyases via pseudo amino acid composition. BioMed Res. Int.,  2016, 2016 1654623
[http://dx.doi.org/10.1155/2016/1654623] [PMID: 27437396] 
[106] 
Feng, P.M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using naïve Bayes. Comput. Math. Methods Med.,  2013, 201, 3567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796] 
[107] 
Feng, P.M.; Ding, H.; Chen, W.; Lin, H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med.,  2013, 2013, 530696
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187] 
[108] 
Chen, W. Recent advances in machine learning methods for predicting heat shock proteins. Curr. Drug Metab.,  2018, 20(3), 224-228.
[PMID: 30378494] 
[109] 
Zuo, Y.C.; Peng, Y.; Liu, L.; Chen, W.; Yang, L.; Fan, G.L. Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’ pseudo amino acid patterns. Anal. Biochem.,  2014, 458, 14-19.
[http://dx.doi.org/10.1016/j.ab.2014.04.032] [PMID: 24802134] 
[110] 
Manavalan, B.; Subramaniyam, S.; Shin, T.H.; Kim, M.O.; Lee, G. Machine-learning-based prediction of cell-penetrating peptides and their uptake efficiency with improved accuracy. J. Proteome Res.,  2018, 17(8), 2715-2726.
[http://dx.doi.org/10.1021/acs.jproteome.8b00148] [PMID: 29893128] 
[111] 
Su, R.; Hu, J.; Zou, Q.; Manavalan, B.; Wei, L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief. Bioinform., 2019.
[http://dx.doi.org/10.1093/bib/bby124] [PMID: 30649170] 
[112] 
Kumar, M.; Raghava, G.P. Prediction of nuclear proteins using SVM and HMM models. BMC Bioinformatics,  2009, 10, 22.
[http://dx.doi.org/10.1186/1471-2105-10-22] [PMID: 19152693] 
[113] 
Kumar, M.; Verma, R.; Raghava, G.P. Prediction of mitochondrial proteins using support vector machine and hidden Markov model. J. Biol. Chem.,  2006, 281(9), 5357-5363.
[http://dx.doi.org/10.1074/jbc.M511061200] [PMID: 16339140] 
[114] 
Kumar, M.; Gromiha, M.M.; Raghava, G.P. SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J. Mol. Recognit.,  2011, 24(2), 303-313.
[http://dx.doi.org/10.1002/jmr.1061] [PMID: 20677174] 
[115] 
Kumar, M.; Gromiha, M.M.; Raghava, G.P. Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics,  2007, 8, 463.
[http://dx.doi.org/10.1186/1471-2105-8-463] [PMID: 18042272] 
[116] 
Kumari, B.; Kumar, R.; Kumar, M. PalmPred: an SVM based palmitoylation prediction method using sequence profile information. PLoS One,  2014, 9(2) e89246
[http://dx.doi.org/10.1371/journal.pone.0089246] [PMID: 24586628] 
[117] 
Kumar, R.; Jain, S.; Kumari, B.; Kumar, M. Protein sub-nuclear localization prediction using SVM and Pfam domain information. PLoS One,  2014, 9(6) e98345
[http://dx.doi.org/10.1371/journal.pone.0098345] [PMID: 24897370] 
[118] 
Chen, W.; Yang, H.; Feng, P.; Ding, H.; Lin, H. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics,  2017, 33(22), 3518-3523.
[http://dx.doi.org/10.1093/bioinformatics/btx479 ] [PMID: 28961687] 
[119] 
Feng, P.M.; Chen, W.; Lin, H.; Chou, K.C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem.,  2013, 442(1), 118-125.
[http://dx.doi.org/10.1016/j.ab.2013.05.024] [PMID: 23756733] 
[120] 
Gao, Q.B.; Jin, Z.C.; Ye, X.F.; Wu, C.; Lu, J.; He, J. Improving the classification of nuclear receptors with feature selection. Protein Pept. Lett.,  2009, 16(7), 823-829.
[http://dx.doi.org/10.2174/092986609788681733] [PMID: 19601913] 
[121] 
Yang, H.; Lv, H.; Ding, H.; Chen, W.; Lin, H. iRNA-2OM: A sequence-based predictor for identifying 2′-O-methylation sites in homo sapiens. J. Comput. Biol.,  2018, 25(11), 1266-1277.
[http://dx.doi.org/10.1089/cmb.2018.0004] [PMID: 30113871] 
[122] 
Tang, H.; Zhao, Y.W.; Zou, P.; Zhang, C.M.; Chen, R.; Huang, P.; Lin, H. HBPred: a tool to identify growth hormone-binding proteins. Int. J. Biol. Sci.,  2018, 14(8), 957-964.
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085] 
[123] 
Feng, C.Q. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics,  2019, 35(9), 1469-1477.
[PMID: 30247625] 
[124] 
Zhang, T.; Tan, P.; Wang, L.; Jin, N.; Li, Y.; Zhang, L.; Yang, H.; Hu, Z.; Zhang, L.; Hu, C.; Li, C.; Qian, K.; Zhang, C.; Huang, Y.; Li, K.; Lin, H.; Wang, D. RNALocate: a resource for RNA subcellular localizations. Nucleic Acids Res.,  2017, 45(D1), D135-D138.
[PMID: 27543076] 
[125] 
Yi, Y.; Zhao, Y.; Li, C.; Zhang, L.; Huang, H.; Li, Y.; Liu, L.; Hou, P.; Cui, T.; Tan, P.; Hu, Y.; Zhang, T.; Huang, Y.; Li, X.; Yu, J.; Wang, D. RAID v2.0: an updated resource of RNA-associated interactions across organisms. Nucleic Acids Res.,  2017, 45(D1), D115-D118.
[http://dx.doi.org/10.1093/nar/gkw1052] [PMID: 27899615] 
[126] 
Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; Deng, E.Z.; Tang, H.; Chen, W.; Lin, H. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics,  2017, 33(3), 467-469.
[PMID: 28171531] 
[127] 
Zhu, X.J. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst.,  2019, 163, 787-793.
[http://dx.doi.org/10.1016/j.knosys.2018.10.007] 
[128] 
Lv, H.; Zhang, Z.M.; Li, S.H.; Tan, J.X.; Chen, W.; Lin, H. Evaluation of different computational methods on 5-methylcytosine sites identification. Brief. Bioinform., 2019.
[http://dx.doi.org/10.1093/bib/bbz048] [PMID: 31157855] 
[129] 
Tan, J.X.; Li, S.H.; Zhang, Z.M.; Chen, C.X.; Chen, W.; Tang, H.; Lin, H. Identification of hormone binding proteins based on machine learning methods. Math. Biosci. Eng.,  2019, 16(4), 2466-2480.
[http://dx.doi.org/10.3934/mbe.2019123] [PMID: 31137222] 
[130] 
Chen, W.; Ding, H.; Zhou, X.; Lin, H.; Chou, K.C. iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition. Anal. Biochem.,  2018, 561-562, 59-65.
[http://dx.doi.org/10.1016/j.ab.2018.09.002] [PMID: 30201554] 
[131] 
Cheng, L.; Hu, Y.; Sun, J.; Zhou, M.; Jiang, Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics,  2018, 34(11), 1953-1956.
[http://dx.doi.org/10.1093/bioinformatics/bty002] [PMID: 29365045] 
[132] 
Cheng, L.; Yang, H.; Zhao, H.; Pei, X.; Shi, H.; Sun, J.; Zhang, Y.; Wang, Z.; Zhou, M. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief. Bioinform.,  2019, 20(1), 203-209.
[http://dx.doi.org/10.1093/bib/bbx103] [PMID: 28968812] 
[133] 
Stephenson, N. Survey of Machine Learning Techniques in Drug Discovery. Curr. Drug Metab.,  2019, 20(3), 185-193.
[134] 
Hou, J.; Wu, T.; Cao, R.; Cheng, J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins,  2019, 87(12), 1165-1178.
[http://dx.doi.org/10.1002/prot.25697] [PMID: 30985027] 
[135] 
Manavalan, B.; Lee, J.; Lee, J. Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms. PLoS One,  2014, 9(9) e106542
[http://dx.doi.org/10.1371/journal.pone.0106542 ] [PMID: 25222008] 
[136] 
Manavalan, B.; Shin, T.H.; Kim, M.O.; Lee, G. AIPpred: Sequence-Based Prediction of Anti-inflammatory Peptides Using Random Forest. Front. Pharmacol.,  2018, 9, 276.
[http://dx.doi.org/10.3389/fphar.2018.00276] [PMID: 29636690] 
[137] 
Long, H.; Liao, B.; Xu, X.; Yang, J. A Hybrid deep learning model for predicting protein hydroxylation sites. Int. J. Mol. Sci.,  2018, 19(9) E2817
[http://dx.doi.org/10.3390/ijms19092817] [PMID: 30231550] 
[138] 
Min, S.; Lee, B.; Yoon, S. Deep learning in bioinformatics. Brief. Bioinform.,  2017, 18(5), 851-869.
[PMID: 27473064] 
[139] 
Wang, S.; Sun, S.; Xu, J. Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins,  2018, 86(Suppl. 1), 67-77.
[http://dx.doi.org/10.1002/prot.25377] [PMID: 28845538] 
[140] 
Zou, Q.; Xing, P.; Wei, L.; Liu, B. Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. RNA,  2019, 25(2), 205-218.
[http://dx.doi.org/10.1261/rna.069112.118] [PMID: 30425123] 
[141] 
Zhang, Z. Deep learning in omics: a survey and guideline. Brief. Funct. Genomics,  2019, 18(1), 41-57.
[142] 
Chen, W.; Lin, H.; Feng, P.M.; Ding, C.; Zuo, Y.C.; Chou, K.C. iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS One,  2012, 7(10) e47843
[http://dx.doi.org/10.1371/journal.pone.0047843] [PMID: 23144709] 
[143] 
Dao, F.Y. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics,  2019, 35(12), 2075-2083.
[PMID: 30428009] 
Rights & Permissions Print Cite
Article Metrics
16
Journal Information
For Authors
For Editors
For Reviewers
Explore Articles
Open Access
Open Access Articles
For Visitors
DOI https://dx.doi.org/10.2174/1573406415666191004125551	Print ISSN 1573-4064
Publisher Name Bentham Science Publisher	Online ISSN 1875-6638
Medicinal Chemistry

Application of Machine Learning Methods in Predicting Nuclear Receptors and their Families

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract