Abstract
Transmembrane protein topology prediction methods play important roles in structural biology, because the structure determination of these types of proteins is extremely difficult by the common biophysical, biochemical and molecular biological methods. The need for accurate prediction methods is high, as the number of known membrane protein structures fall far behind the estimated number of these proteins in various genomes. The accuracy of these prediction methods appears to be higher than most prediction methods applied on globular proteins, however it decreases slightly with the increasing number of structures. Unfortunately, most prediction algorithms use common machine learning techniques, and they do not reveal why topologies are predicted with such a high success rate and which biophysical or biochemical properties are important to achieve this level of accuracy. Incorporating topology data determined so far into the prediction methods as constraints helps us to reach even higher prediction accuracy, therefore collection of such topology data is also an important issue.
Keywords: Transmembrane protein, topology prediction, machine learning algorithm, hidden Markov model, support vector machine, Helical, Lipid bilayers, PDBTM database, Protein Data Bank, INTEGRAL MEMBRANE PROTEINS, Proline kinks, bacteriorhodopsin, lutropin/ choriogo-nadotropin receptor, Transmembrane Folds, immuno-localization, molecular biology modifications of proteins, fusion proteins, Escherichia coli, Proteins with Ambiguous Orientation, Globular Proteins, Saccharomyces cerevisiae, SVMtop, Signal Peptide Predictions, Topography Predictions, Dense Alignment Surface (DAS), latent semantic analysis, higher order statistics, evidence-theoretic K-nearest neighbor prediction algorithm, Consensus prediction methods, Benchmark Sets, prediction accu-racies, SwissProt annotations, per segment, per protein, Reentrant Loop Predictions, Constrained Predictions, Genome Wide Topology Predictions