Abstract
Background: Sphingomonas is a kind of microbial resources used for biodegradation of aromatic compounds. In computational biology, identifying protein coding domains in Sphingomonas genome is known as a challenging problem.
Objective: In this work, to address the challenge, we propose a novel method to predict protein coding regions from Sphingomonas genome by 3-base periodicity.
Method: In our method, DNA sequences are firstly transformed into wavelet by a so-called 3-base characteristics strategy. After that, sliding windows with certain fixed lengths are developed to identify protein coding regions, in which the initial size of sliding windows and values of thresholds are set by experimentally verified protein data in NCBI library.
Results: As results, an experimental verified protein coding domain in congeneric families of Sphingomonas is identified from Sphingomonas genome.
Conclusion: This would be with high possibility to encode the similar functioning proteins. As well, some potential protein coding regions are marked by narrowing the forecast areas, and then an extensible sliding window strategy is used to improve predictive accuracy.
Keywords: Protein identification, Sphingomonas, 3-base periodicity, sliding window, sequence mapping, extensible strategy.
Graphical Abstract