Abstract
Chemical information can be used to inform biology through being employed to develop bioinformatic tools. One area where bioinformatic tools are valuable is the study of linear motif-mediated protein interactions. Linear motifs are short sequences found mostly in disordered regions of proteins that function in cellular signaling and regulation, by binding to protein interaction domains or by being the target of post-translational modifications. Linear motifs pose difficulty not only to experimental study, but also computational methods; they are difficult to identify due to their small size; and their binding specificity is affected by several factors acting in concert. We discuss the different ways linear motifs can be represented computationally, and how computational approaches can integrate the different specificity-determining factors. We illustrate these issues on our own work focusing on the use of three-dimensional structural information in predicting protein phosphorylation sites, and the integration of diverse types of data in predicting nuclear localization. Computational approaches will play an increasing role in the future, allowing new relationships and system-wide understanding to be unearthed from the large datasets becoming available through high-throughput studies.
Keywords: Computational prediction, data integration, linear motif, peptide:protein interactions, bioinformatic tools, computational methods, nuclear localization, Computational approaches, eukaryotic linear motifs, proximal motifs, post-translational modifications, constraints extrinsic, potential therapeutics