Abstract
Discriminating outer membrane proteins for globular proteins (GPs) and other types of membrane proteins from genomic sequences is an important and hot topic. In this paper, a measure based on information discrepancy is proposed and applied to the discrimination of outer membrane proteins. It differs from previous methods which are based on amino acid composition. Our approach focuses on the comparison of subsequence distributions and takes into account the effect of residue order in protein primary structures. As a result, the new approach outperforms all previous methods on the same benchmark datasets. In particular, we show that the proposed approach has correctly identified the outer membrane proteins at an accuracy of 99% for the training set of 337 proteins and has correctly excluded the GPs at an accuracy of 86% in a non-redundant dataset of 668 proteins. Furthermore, this method is able to correctly exclude α-helical membrane proteins at an accuracy of 100%.
Keywords: Outer membrane protein, subsequence distribution, FDOD measure