Abstract
Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences, employing a hybrid methodology that combines a digital bandpass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences.
Methods: This research showcases the utility of Haar and Daubechies wavelet transforms, both nonparametric and parametric spectral estimation methods, and the deployment of a digital band pass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of sum-of-sinusoids (SoS) mathematical models with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of this proposed method in modeling DNA sequences optimally and accurately identifying genes.
Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak location by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy when compared with existing methods.
Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques.