DeepPTM: Protein Post-translational Modification Prediction from
Protein Sequences by Combining Deep Protein Language Model with
Vision Transformers

Necla   Nisa   Soylu; Emre      Sefer

doi:10.2174/0115748936283134240109054157

Abstract

Introduction: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost.

Methods: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species.

Results: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques.

Conclusion: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.

« Previous Next »

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0115748936283134240109054157	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers

Abstract Play Pause

Related Journals

Related Books

Abstract