Abstract
Background: Analyzing genomic sequences plays a crucial role in understanding biological diversity and classifying Bamboo species. Existing methods for genomic sequence analysis suffer from limitations such as complexity, low accuracy, and the need for constant reconfiguration in response to evolving genomic datasets.
Aim: This study addresses these limitations by introducing a novel Dual Heuristic Feature Selection- based Ensemble Classification Model (DHFS-ECM) for the precise identification of Bamboo species from genomic sequences.
Methods: The proposed DHFS-ECM method employs a Genetic Algorithm to perform dual heuristic feature selection. This process maximizes inter-class variance, leading to the selection of informative N-gram feature sets. Subsequently, intra-class variance levels are used to create optimal training and validation sets, ensuring comprehensive coverage of class-specific features. The selected features are then processed through an ensemble classification layer, combining multiple stratification models for species-specific categorization.
Results: Comparative analysis with state-of-the-art methods demonstrate that DHFS-ECM achieves remarkable improvements in accuracy (9.5%), precision (5.9%), recall (8.5%), and AUC performance (4.5%). Importantly, the model maintains its performance even with an increased number of species classes due to the continuous learning facilitated by the Dual Heuristic Genetic Algorithm Model.
Conclusion: DHFS-ECM offers several key advantages, including efficient feature extraction, reduced model complexity, enhanced interpretability, and increased robustness and accuracy through the ensemble classification layer. These attributes make DHFS-ECM a promising tool for real-time clinical applications and a valuable contribution to the field of genomic sequence analysis.
Graphical Abstract
[http://dx.doi.org/10.1002/tpg2.20034] [PMID: 33217204]
[http://dx.doi.org/10.1186/s13073-020-00767-w] [PMID: 32791978]
[http://dx.doi.org/10.1007/s00122-021-03779-1] [PMID: 33630103]
[http://dx.doi.org/10.1007/s00122-021-03880-5] [PMID: 34117908]
[http://dx.doi.org/10.1007/s00122-020-03759-x] [PMID: 33523261]
[http://dx.doi.org/10.1007/s00122-020-03658-1] [PMID: 32809035]
[http://dx.doi.org/10.1109/TCBB.2015.2453948] [PMID: 27019499]
[http://dx.doi.org/10.1109/ACCESS.2020.2994646]
[http://dx.doi.org/10.1007/s40484-019-0154-0] [PMID: 34113473]
[http://dx.doi.org/10.1002/tpg2.20004] [PMID: 33016630]
[http://dx.doi.org/10.1109/JBHI.2019.2944865] [PMID: 31581101]
[http://dx.doi.org/10.1109/TBME.2019.2897285] [PMID: 30716030]
[http://dx.doi.org/10.1109/TCBB.2017.2720669] [PMID: 28678713]
[http://dx.doi.org/10.1109/TCBB.2017.2760827] [PMID: 28991749]
[http://dx.doi.org/10.1109/TCBB.2018.2886334] [PMID: 30575544]
[http://dx.doi.org/10.1109/ACCESS.2020.3029234]
[http://dx.doi.org/10.1109/TCBB.2018.2883333] [PMID: 30489272]
[http://dx.doi.org/10.1109/ACCESS.2021.3052923]
[http://dx.doi.org/10.1016/j.chemolab.2023.104932]
[http://dx.doi.org/10.1109/ACCESS.2021.3073728] [PMID: 34812391]
[http://dx.doi.org/10.59543/ijmscs.v1i.7715]
[http://dx.doi.org/10.1109/TCBB.2017.2727042] [PMID: 28715336]
[http://dx.doi.org/10.1109/TCYB.2017.2750691] [PMID: 28976327]
[http://dx.doi.org/10.1109/ACCESS.2019.2944132]
[http://dx.doi.org/10.1109/ACCESS.2019.2939556]
[http://dx.doi.org/10.1002/gepi.22346] [PMID: 32964524]
[http://dx.doi.org/10.1109/ACCESS.2020.3036090]
[http://dx.doi.org/10.1109/ICICICT54557.2022.9917848]
[http://dx.doi.org/10.1111/pbi.13420] [PMID: 32452105]
[http://dx.doi.org/10.1109/ACCESS.2021.3103746]
[http://dx.doi.org/10.1007/s10489-021-02572-3] [PMID: 34764607]
[http://dx.doi.org/10.1038/s41467-020-19921-4] [PMID: 33262328]
[http://dx.doi.org/10.1002/tpg2.20015] [PMID: 33016608]
[http://dx.doi.org/10.2174/1389202923666220927105311] [PMID: 36778194]
[http://dx.doi.org/10.2174/0113892029269523231101051455] [PMID: 38169652]
[http://dx.doi.org/10.1079/cabicompendium.8398]
[http://dx.doi.org/10.3389/fmicb.2023.1111498] [PMID: 36896433]