Abstract
On-going efforts to improve protein structure prediction stimulate the development of scoring functions and methods for model quality assessment (MQA) that can be used to rank and select the best protein models for further refinement. In this work, sequence-based prediction of relative solvent accessibility (RSA) is employed as a basis for a simple MQA method for soluble proteins, and subsequently extended to the much less explored case of (alpha-helical) membrane proteins. In analogy to soluble proteins, the level of exposure to the lipid of amino acid residues in transmembrane (TM) domains is captured in terms of the relative lipid accessibility (RLA), which is predicted from sequence using lowcomplexity Support Vector Regression models. On an independent set of 23 TM proteins, the new SVR-based predictor yields correlation coefficient (CC) of 0.56 between the predicted and observed RLA profiles, as opposed to CC of 0.13 for a baseline predictor that utilizes TMLIP2H empirical lipophilicity scale (with standard deviations of about 0.15). A simple MQA approach is then defined by ranking models of membrane proteins in terms of consistency between predicted and observed RLA profiles, as a measure of similarity to the native structure. The new method does not require a set of decoy models to optimize parameters, circumventing current limitations in this regard. Several different sets of models, including those generated by fragment based folding simulations, and decoys obtained by swapping TM helices to mimic errors in template based assignment, are used to assess the new approach. Predicted RLA profiles can be used to successfully discriminate near native models from non-native decoys in most cases, significantly improving the separation of correct and incorrectly folded models compared to a simple baseline approach that utilizes TMLIP2H. As suggested by the robust performance of a simple MQA method for soluble proteins that utilizes more accurate RSA predictions, further significant improvements are likely to be achieved. The steady growth in the number of resolved membrane protein structures is expected to yield enhanced RLA predictions, facilitating further efforts to improve de novo and template based prediction of membrane protein structure.
Keywords: Membrane proteins, model quality assessment, solvent accessibility, relative lipid accessibility, prediction, MQA, CASP, RLA, TM, SVR