Generic placeholder image

Current Genomics

Editor-in-Chief

ISSN (Print): 1389-2029
ISSN (Online): 1875-5488

Research Article

DHFS-ECM: Design of a Dual Heuristic Feature Selection-based Ensemble Classification Model for the Identification of Bamboo Species from Genomic Sequences

Author(s): Aditi R. Durge and Deepti D. Shrimankar*

Volume 25, Issue 3, 2024

Published on: 01 February, 2024

Page: [185 - 201] Pages: 17

DOI: 10.2174/0113892029268176240125055419

Price: $65

Abstract

Background: Analyzing genomic sequences plays a crucial role in understanding biological diversity and classifying Bamboo species. Existing methods for genomic sequence analysis suffer from limitations such as complexity, low accuracy, and the need for constant reconfiguration in response to evolving genomic datasets.

Aim: This study addresses these limitations by introducing a novel Dual Heuristic Feature Selection- based Ensemble Classification Model (DHFS-ECM) for the precise identification of Bamboo species from genomic sequences.

Methods: The proposed DHFS-ECM method employs a Genetic Algorithm to perform dual heuristic feature selection. This process maximizes inter-class variance, leading to the selection of informative N-gram feature sets. Subsequently, intra-class variance levels are used to create optimal training and validation sets, ensuring comprehensive coverage of class-specific features. The selected features are then processed through an ensemble classification layer, combining multiple stratification models for species-specific categorization.

Results: Comparative analysis with state-of-the-art methods demonstrate that DHFS-ECM achieves remarkable improvements in accuracy (9.5%), precision (5.9%), recall (8.5%), and AUC performance (4.5%). Importantly, the model maintains its performance even with an increased number of species classes due to the continuous learning facilitated by the Dual Heuristic Genetic Algorithm Model.

Conclusion: DHFS-ECM offers several key advantages, including efficient feature extraction, reduced model complexity, enhanced interpretability, and increased robustness and accuracy through the ensemble classification layer. These attributes make DHFS-ECM a promising tool for real-time clinical applications and a valuable contribution to the field of genomic sequence analysis.

Graphical Abstract

[1]
Mukhlif, A. A.; Al-khateeb, B.; Mohammed, M. A. Breast cancer images Classification using a new transfer learning technique. 2023.
[2]
Ibba, M.I.; Crossa, J.; Montesinos-López, O.A.; Montesinos-López, A.; Juliana, P.; Guzman, C.; Delorean, E.; Dreisigacker, S.; Poland, J. Genome‐based prediction of multiple wheat quality traits in multiple years. Plant Genome, 2020, 13(3), e20034.
[http://dx.doi.org/10.1002/tpg2.20034] [PMID: 33217204]
[3]
Poran, A.; Harjanto, D.; Malloy, M.; Arieta, C.M.; Rothenberg, D.A.; Lenkala, D.; van Buuren, M.M.; Addona, T.A.; Rooney, M.S.; Srinivasan, L.; Gaynor, R.B. Sequence-based prediction of SARS-CoV-2 vaccine targets using a mass spectrometry-based bioinformatics predictor identifies immunogenic T cell epitopes. Genome Med., 2020, 12(1), 70.
[http://dx.doi.org/10.1186/s13073-020-00767-w] [PMID: 32791978]
[4]
Galán, R.J.; Bernal-Vasquez, A.M.; Jebsen, C.; Piepho, H.P.; Thorwarth, P.; Steffan, P.; Gordillo, A.; Miedaner, T. Early prediction of biomass in hybrid rye based on hyperspectral data surpasses genomic predictability in less-related breeding material. Theor. Appl. Genet., 2021, 134(5), 1409-1422.
[http://dx.doi.org/10.1007/s00122-021-03779-1] [PMID: 33630103]
[5]
Auinger, H.J.; Lehermeier, C.; Gianola, D.; Mayer, M.; Melchinger, A.E.; da Silva, S.; Knaak, C.; Ouzunova, M.; Schön, C.C. Calibration and validation of predicted genomic breeding values in an advanced cycle maize population. Theor. Appl. Genet., 2021, 134(9), 3069-3081.
[http://dx.doi.org/10.1007/s00122-021-03880-5] [PMID: 34117908]
[6]
Knoch, D.; Werner, C.R.; Meyer, R.C.; Riewe, D.; Abbadi, A.; Lücke, S.; Snowdon, R.J.; Altmann, T. Multi-omics-based prediction of hybrid performance in canola. Theor. Appl. Genet., 2021, 134(4), 1147-1165.
[http://dx.doi.org/10.1007/s00122-020-03759-x] [PMID: 33523261]
[7]
Pandey, M.K.; Chaudhari, S.; Jarquin, D.; Janila, P.; Crossa, J.; Patil, S.C.; Sundravadana, S.; Khare, D.; Bhat, R.S.; Radhakrishnan, T.; Hickey, J.M.; Varshney, R.K. Genome-based trait prediction in multi- environment breeding trials in groundnut. Theor. Appl. Genet., 2020, 133(11), 3101-3117.
[http://dx.doi.org/10.1007/s00122-020-03658-1] [PMID: 32809035]
[8]
Patra, P.; Izawa, T.; Pena-Castillo, L. REPA: Applying pathway analysis to genome-wide transcription factor binding data. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2018, 15(4), 1270-1283.
[http://dx.doi.org/10.1109/TCBB.2015.2453948] [PMID: 27019499]
[9]
Yu, X.; Gan, Z.; Xu, Y.; Wan, S.; Li, M.; Ding, S.; Zeng, T. Identifying essential methylation patterns and genes associated with stroke. IEEE Access, 2020, 8, 96669-96676.
[http://dx.doi.org/10.1109/ACCESS.2020.2994646]
[10]
Singh, S.; Yang, Y.; Póczos, B.; Ma, J. Predicting enhancerpromoter interaction from genomic sequence with deep neural networks. Quant. Biol., 2019, 7(2), 122-137.
[http://dx.doi.org/10.1007/s40484-019-0154-0] [PMID: 34113473]
[11]
Mellers, G.; Mackay, I.; Cowan, S.; Griffiths, I.; Martinez-Martin, P.; Poland, J.A.; Bekele, W.; Tinker, N.A.; Bentley, A.R.; Howarth, C.J. Implementing within‐cross genomic prediction to reduce oat breeding costs. Plant Genome, 2020, 13(1), e20004.
[http://dx.doi.org/10.1002/tpg2.20004] [PMID: 33016630]
[12]
Wang, H.; Sham, P.; Tong, T.; Pang, H. Pathway-based single-cell RNA-seq classification, clustering, and construction of gene-gene interactions networks using random forests. IEEE J. Biomed. Health Inform., 2020, 24(6), 1814-1822.
[http://dx.doi.org/10.1109/JBHI.2019.2944865] [PMID: 31581101]
[13]
Davi, C.; Pastor, A.; Oliveira, T.; Neto, F.B.L.; Braga-Neto, U.; Bigham, A.W.; Bamshad, M.; Marques, E.T.A.; Acioli-Santos, B. Severe dengue prognosis using human genome data and machine learning. IEEE Trans. Biomed. Eng., 2019, 66(10), 2861-2868.
[http://dx.doi.org/10.1109/TBME.2019.2897285] [PMID: 30716030]
[14]
Sergeev, R.S.; Kavaliou, I.S.; Sataneuski, U.V.; Gabrielian, A.; Rosenthal, A.; Tartakovsky, M.; Tuzikov, A.V. Genome-wide analysis of MDR and XDR tuberculosis from belarus: Machine-learning approach. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2019, 16(4), 1398-1408.
[http://dx.doi.org/10.1109/TCBB.2017.2720669] [PMID: 28678713]
[15]
Wu, H.C.; Wei, X.G.; Chan, S.C. Novel consensus gene selection criteria for distributed gpu partial least squares-based gene microarray analysis in diffused large B cell lymphoma (DLBCL) and related findings. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2018, 15(6), 2039-2052.
[http://dx.doi.org/10.1109/TCBB.2017.2760827] [PMID: 28991749]
[16]
Yang, X.; Tian, L.; Chen, Y.; Yang, L.; Xu, S.; Wu, W. Inverse projection representation and category contribution rate for robust tumor recognition. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2018, 17(4), 1.
[http://dx.doi.org/10.1109/TCBB.2018.2886334] [PMID: 30575544]
[17]
Arowolo, M.O.; Adebiyi, M.O.; Adebiyi, A.A.; Okesola, O.J. A hybrid heuristic dimensionality reduction methods for classifying malaria vector gene expression data. IEEE Access, 2020, 8, 182422-182430.
[http://dx.doi.org/10.1109/ACCESS.2020.3029234]
[18]
Yuan, X.; Bai, J.; Zhang, J.; Yang, L.; Duan, J.; Li, Y.; Gao, M. CONDEL: Detecting copy number variation and genotyping deletion zygosity from single tumor samples using sequence data. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2018, 17(4), 1.
[http://dx.doi.org/10.1109/TCBB.2018.2883333] [PMID: 30489272]
[19]
Choi, J.; Rhee, J.K.; Chae, H. Cell subtype classification via representation learning based on a denoising autoencoder for single-cell rna sequencing. IEEE Access, 2021, 9, 14540-14548.
[http://dx.doi.org/10.1109/ACCESS.2021.3052923]
[20]
Mohammed, M.A.; Lakhan, A.; Abdulkareem, K.H.; Garcia-Zapirain, B. Federated auto-encoder and XGBoost schemes for multi-omics cancer detection in distributed fog computing paradigm. Chemom. Intell. Lab. Syst., 2023, 241(July), 104932.
[http://dx.doi.org/10.1016/j.chemolab.2023.104932]
[21]
Whata, A.; Chimedza, C. Deep learning for SARS COV-2 genome sequences. IEEE Access, 2021, 9, 59597-59611.
[http://dx.doi.org/10.1109/ACCESS.2021.3073728] [PMID: 34812391]
[22]
Arif, Z.H.; Cengiz, K. Severity classification for COVID-19 infections based on lasso-logistic regression model. Int. J. Math. Comput. Sci., 2023, 1, 25-32.
[http://dx.doi.org/10.59543/ijmscs.v1i.7715]
[23]
Sedaghat, N.; Fathy, M.; Modarressi, M.H.; Shojaie, A. Combining supervised and unsupervised learning for improved mirna target prediction. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2018, 15(5), 1.
[http://dx.doi.org/10.1109/TCBB.2017.2727042] [PMID: 28715336]
[24]
Wu, Y.; Tong, Y.; Zhu, X.; Wu, X. NOSEP: Nonoverlapping sequence pattern mining with gap constraints. IEEE Trans. Cybern., 2018, 48(10), 2809-2822.
[http://dx.doi.org/10.1109/TCYB.2017.2750691] [PMID: 28976327]
[25]
Jiang, Y.; Pan, X.; Zhang, Y.H.; Huang, T.; Gao, Y. Gene expression difference between primary and metastatic renal cell carcinoma using patient-derived xenografts. IEEE Access, 2019, 7, 142586-142594.
[http://dx.doi.org/10.1109/ACCESS.2019.2944132]
[26]
Chen, L.; Pan, X.; Zeng, T.; Zhang, Y.H.; Huang, T.; Cai, Y.D. Identifying essential signature genes and expression rules associated with distinctive development stages of early embryonic cells. IEEE Access, 2019, 7, 128570-128578.
[http://dx.doi.org/10.1109/ACCESS.2019.2939556]
[27]
Barbeira, A.N.; Melia, O.J.; Liang, Y.; Bonazzola, R.; Wang, G.; Wheeler, H.E.; Aguet, F.; Ardlie, K.G.; Wen, X.; Im, H.K. Finemapping and QTL tissue‐sharing information improves the reliability of causal gene identification. Genet. Epidemiol., 2020, 44(8), 854-867.
[http://dx.doi.org/10.1002/gepi.22346] [PMID: 32964524]
[28]
Abbas, Z.; Tayara, H.; Chong, K. Spinenet-6MA: A novel deep learning tool for predicting DNA N6-methyladenine sites in genomes. IEEE Access, 2020, 8, 201450-201457.
[http://dx.doi.org/10.1109/ACCESS.2020.3036090]
[29]
Rajinikanth, V. Skin melanoma segmentation using VGG-UNet with Adam / SGD optimizer : A study 2022 Third Int. Conf. Intell. Comput. Instrum. Control Technol., , pp. 982-986.2022.
[http://dx.doi.org/10.1109/ICICICT54557.2022.9917848]
[30]
Yu, X.; Leiboff, S.; Li, X.; Guo, T.; Ronning, N.; Zhang, X.; Muehlbauer, G.J.; Timmermans, M.C.P.; Schnable, P.S.; Scanlon, M.J.; Yu, J. Genomic prediction of maize microphenotypes provides insights for optimizing selection and mining diversity. Plant Biotechnol. J., 2020, 18(12), 2456-2465.
[http://dx.doi.org/10.1111/pbi.13420] [PMID: 32452105]
[31]
Ramasamy, M.D.; Periasamy, K.; Krishnasamy, L.; Dhanaraj, R.K.; Kadry, S.; Nam, Y. Multi-disease classification model using strassen’s half of threshold (SHoT) training algorithm in healthcare sector. IEEE Access, 2021, 9, 112624-112636.
[http://dx.doi.org/10.1109/ACCESS.2021.3103746]
[32]
Dasari, C.M.; Bhukya, R. Explainable deep neural networks for novel viral genome prediction. Appl. Intell., 2021.
[http://dx.doi.org/10.1007/s10489-021-02572-3] [PMID: 34764607]
[33]
Zrimec, J.; Börlin, C.S.; Buric, F.; Muhammad, A.S.; Chen, R.; Siewers, V.; Verendel, V.; Nielsen, J.; Töpel, M.; Zelezniak, A. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat. Commun., 2020, 11(1), 6141.
[http://dx.doi.org/10.1038/s41467-020-19921-4] [PMID: 33262328]
[34]
Dai, X.; Xu, Z.; Liang, Z.; Tu, X.; Zhong, S.; Schnable, J.C.; Li, P. Non‐homology‐based prediction of gene functions in maize (Zea mays ssp. mays). Plant Genome, 2020, 13(2), e20015.
[http://dx.doi.org/10.1002/tpg2.20015] [PMID: 33016608]
[35]
Shrimankar, D.D.; Durge, A.R.; Sawarkar, A.D. Heuristic analysis of genomic sequence processing models for high efficiency prediction: A statistical perspective. Curr. Genomics, 2022, 23(5), 299-317.
[http://dx.doi.org/10.2174/1389202923666220927105311] [PMID: 36778194]
[36]
Ahuja, S.K.; Shrimankar, D.D.; Durge, A.R. A study and analysis of disease identification using genomic sequence processing models: An empirical review. Curr. Genomics, 2023, 24(4), 207-235.
[http://dx.doi.org/10.2174/0113892029269523231101051455] [PMID: 38169652]
[37]
Bamboo - Nucleotide - NCBI. Available from: https://www.ncbi.nlm.nih.gov/nuccore/?term=Bamboo
[38]
Boniopsis Bamboo | Bamboo Down under Available from: https://www.bamboodownunder.com.au/boniopsis-bamboo
[40]
The editors of encyclopaedia britannica. Bamboo | characteristics, distribution & uses. Encyclopedia britannica. , 2024. Available from: https://www.britannica.com/plant/bamboo
[41]
Rojas‐Sandoval, J.; Acevedo‐Rodríguez, P. Bambusa vulgaris (common bamboo) CABI Compendium, 2022.
[http://dx.doi.org/10.1079/cabicompendium.8398]
[42]
Ameh, E.; Ozor, G.; Mgbachi, C. Potentials of bamboo (bambusa) vulgaris stem as a raw material for pulp and paper making. ResearchGate., 2017. Available from: https://www.researchgate.net/publication/344196993
[43]
Schröder, S. Dendrocalamus giganteus - Giant bamboo. Guadua bamboo - Experts in the world’s strongest bamboo. 2024. Available from: https://www.guaduabamboo.com/blog/dendrocalamus-giganteus
[44]
Gigantochloa atroviolacea Widjaja | Plants of the world online | Kew science. Plants of the World Online, Available from: https://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:931831-1
[45]
Sun, H.; Hu, W.; Dai, Y.; Ai, L.; Wu, M.; Hu, J.; Zuo, Z.; Li, M.; Yang, H.; Ma, J. Moso bamboo (Phyllostachys edulis (Carrière) J. Houzeau) invasion affects soil microbial communities in adjacent planted forests in the Lijiang River basin, China. Front. Microbiol., 2023, 14, 1111498.
[http://dx.doi.org/10.3389/fmicb.2023.1111498] [PMID: 36896433]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy