Generic placeholder image

Current Protein & Peptide Science

Editor-in-Chief

ISSN (Print): 1389-2037
ISSN (Online): 1875-5550

Predicting Experimental Properties of Proteins from Sequence by Machine Learning Techniques

Author(s): Pawel Smialowski, Antonio J. Martin-Galiano, Jurgen Cox and Dmitrij Frishman

Volume 8, Issue 2, 2007

Page: [121 - 133] Pages: 13

DOI: 10.2174/138920307780363398

Price: $65

Abstract

Efficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine- learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection.

Keywords: Structural genomics, machine learning, experimental success rate, target selection


Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy