Abstract
Membrane protein, which is closely related to cellular biological functions, is a vital component of cell membranes. Predicting membrane protein and its types is a challenging task that offers promising results. This paper proposes a novel membrane detection method that is more efficient and accurate than traditional technologies. Two methods were used to extract features from protein sequences. The 20-D feature was extracted from the position-specific scoring matrix of proteins, and the 188-D feature was extracted based on protein composition and physical-chemical properties. These features, together with a novel ensemble voting strategy that was derived from the theorem of minimal false classified samples set, were employed to improve classification performance. The proposed method offers efficient memory usage and accurate predictions. By using the jackknife test on the 20-D feature, the proposed method obtained 91.2% accuracy in distinguishing membrane proteins and 86.1% accuracy in predicting membrane protein types. Two interesting discoveries are presented: 1) approximately 12% of total enzymes are membrane proteins, and 2) membrane proteins occupy a higher proportion in alternative splicing peptides than normal proteins. A new membrane protein dataset which contains 7388 membrane protein sequences is built by using the latest Swiss-Prot database. Furthermore, a Web server and software called BinMemPredict is developed, which is freely accessible to the public at http://datamining.xmu.- edu.cn/software/bmp.
Keywords: Alternative splicing membrane, classification, ensemble, enzyme.