摘要
动机:了解正确的蛋白质亚细胞定位对于了解蛋白质的功能和揭示蛋白质亚细胞错误定位引起的许多人类疾病的机制是必要的,这是在进行基因治疗疾病之前所必需的。此外,众所周知,基因治疗是一种有效的方法,通过将基因治疗产品定位于特定的亚细胞室来克服疾病。蛋白质功能预测的深层神经网络由于其在非线性分类能力方面的强大优势,使得现有基因组数据的大量增加而日益受到人们的青睐。然而,它们仍然存在着一些缺点,如超量参数过多和有足够数量的标记数据。结果:提出了一种基于序列信息的深层森林蛋白质定位算法.预测模型使用具有多层结构的随机森林网络来识别蛋白质的亚细胞区域。该模型是在最新的uniprot发布的蛋白质数据集上进行训练和测试的,我们证明,我们的深森林预测蛋白质的亚细胞位置仅给出了高精度的蛋白质序列,超过了目前最先进的算法。同时,与深层神经网络不同的是,它的参数数目要少得多,训练起来也容易得多。
关键词: 蛋白质亚细胞定位,机器学习,深林,序列信息,UniProt,算法的。
Current Gene Therapy
Title:Deep Forest-based Prediction of Protein Subcellular Localization
Volume: 18 Issue: 5
关键词: 蛋白质亚细胞定位,机器学习,深林,序列信息,UniProt,算法的。
摘要: Motivation: Knowledge of the correct protein subcellular localization is necessary for understanding the function of a protein and revealing the mechanism of many human diseases due to protein subcellular mislocalization, which is required before approaching gene therapy to treat a disease. In addition, it is well-known that the gene therapy is an effective way to overcome disease by targeting a gene therapy product to a specific subcellular compartment. Deep neural networks to predict protein function have become increasingly popular due to large increases in the available genomics data due to its strong superiority in the non-linear classification ability. However, they still have some drawbacks such as too many hyper-parameters and sufficient amount of labeled data.
Results: We present a deep forest-based protein location algorithm relying on sequence information. The prediction model uses a random forest network with a multi-layered structure to identify the subcellular regions of protein. The model was trained and tested on a latest UniProt releases protein dataset, and we demonstrate that our deep forest predict the subcellular location of proteins given only the protein sequence with high accuracy, outperforming the current state-of-art algorithms. Meanwhile, unlike the deep neural networks, it has a significantly smaller number of parameters and is much easier to train.
Export Options
About this article
Cite this article as:
Deep Forest-based Prediction of Protein Subcellular Localization, Current Gene Therapy 2018; 18 (5) . https://dx.doi.org/10.2174/1566523218666180913110949
DOI https://dx.doi.org/10.2174/1566523218666180913110949 |
Print ISSN 1566-5232 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5631 |
Call for Papers in Thematic Issues
Programmed Cell Death Genes in Oncology: Pioneering Therapeutic and Diagnostic Frontiers (BMS-CGT-2024-HT-45)
Programmed Cell Death (PCD) is recognized as a pivotal biological mechanism with far-reaching effects in the realm of cancer therapy. This complex process encompasses a variety of cell death modalities, including apoptosis, autophagic cell death, pyroptosis, and ferroptosis, each of which contributes to the intricate landscape of cancer development and ...read more
Related Journals
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
- Announcements