Abstract
As an alternative to X-ray crystallography, nuclear magnetic resonance (NMR) has also emerged as the method of choice for studying both protein structure and dynamics in solution. However, little work using computational models such as Gaussian network model (GNM) and machine learning approaches has focused on NMR-derived proteins to predict the residue flexibility, which is represented by the root mean square deviation (RMSD) with respect to the average structure. We provide a large-scale comparison of computational models, including GNM, parameter-free GNM and several linear regression models using local solvent exposures as inputs, based on a dataset of 1609 protein chains whose structures were resolved by NMR. The result again confirmed that the correlation of GNM outputs with raw RMSD values was better than that using B-factors of X-ray data. Nevertheless, it was also concluded that the parameter-free GNM and the solvent exposure based linear regression models performed worse than GNM when predicting RMSD, contrary to results using X-ray data. The discrepancy of residue flexibility prediction between NMR and X-ray data is likely attributable to a combination of their physical and methodological differences.
Keywords: Residue flexibility, root mean square deviation, Gaussian network model, linear regression model, Debye, –, Waller temperature factor, microtubules, crystallographic structures, thermal stability, molecular dynamic simulation, Vermont Company