Abstract
Aim and Objective: Missense mutation (MM) may lead to various human diseases by disabling proteins. Accurate prediction of MM is important and challenging for both protein function annotation and drug design. Although several computational methods yielded acceptable success rates, there is still room for further enhancing the prediction performance of MM.
Materials and Methods: In the present study, we designed a new feature extracting method, which considers the impact degree of residues in the microenvironment range to the mutation site. Stringent cross-validation and independent test on benchmark datasets were performed to evaluate the efficacy of the proposed feature extracting method. Furthermore, three heterogeneous prediction models were trained and then ensembled for the final prediction. By combining the feature representation method and classifier ensemble technique, we reported a novel MM predictor called TargetMM for identifying the pathogenic mutations from the neutral ones.
Results: Comparison outcomes based on statistical evaluation demonstrate that TargetMM outperforms the prior advanced methods on the independent test data. The source codes and benchmark datasets of TargetMM are freely available at https://github.com/sera616/TargetMM.git for academic use.
Keywords: Human disease, missense mutation, mutation prediction, feature extracting, classifier ensemble, proteins.
Graphical Abstract
[http://dx.doi.org/10.1371/journal.pone.0150965] [PMID: 26982818]
[http://dx.doi.org/10.1371/journal.pone.0082699] [PMID: 24416147]
[http://dx.doi.org/10.1007/s10822-016-9914-3] [PMID: 27160393]
[http://dx.doi.org/10.1016/j.jmb.2019.02.017] [PMID: 30796987]
[http://dx.doi.org/10.1016/j.mrgentox.2018.06.005] [PMID: 30442350]
[http://dx.doi.org/10.1182/blood-2016-10-743294] [PMID: 28270451]
[http://dx.doi.org/10.1097/CAD.0000000000000425] [PMID: 27564227]
[http://dx.doi.org/10.1186/1471-2164-15-455] [PMID: 24916671]
[http://dx.doi.org/10.1016/j.gene.2018.09.028] [PMID: 30240882]
[http://dx.doi.org/10.1093/bib/bbr070] [PMID: 22247263]
[http://dx.doi.org/10.1371/journal.pcbi.1005628] [PMID: 28640805]
[http://dx.doi.org/10.1093/nar/28.1.235] [PMID: 10592235]
[http://dx.doi.org/10.1093/nar/gky1004] [PMID: 30357411]
[http://dx.doi.org/10.1038/nprot.2009.86] [PMID: 19561590]
[http://dx.doi.org/10.1093/nar/gkg509] [PMID: 12824425]
[http://dx.doi.org/10.1371/journal.pone.0046688] [PMID: 23056405]
[http://dx.doi.org/10.1093/nar/gkr407] [PMID: 21727090]
[http://dx.doi.org/10.1371/journal.pone.0121812] [PMID: 25794181]
[http://dx.doi.org/10.1038/nmeth0410-248] [PMID: 20354512]
[http://dx.doi.org/10.1093/bioinformatics/btm119] [PMID: 17384424]
[http://dx.doi.org/10.1038/nature12625] [PMID: 24048066]
[http://dx.doi.org/10.1093/bioinformatics/bth466] [PMID: 15308540]
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[http://dx.doi.org/10.2174/1386207320666170217152811] [PMID: 28215145]
[http://dx.doi.org/10.1093/bib/bbx164] [PMID: 29186295]
[http://dx.doi.org/10.1093/nar/gkr284]
[http://dx.doi.org/10.1021/acs.jcim.7b00397] [PMID: 29361215]
[http://dx.doi.org/10.1093/nar/gkm238] [PMID: 17526529]
[http://dx.doi.org/10.1158/0008-5472.CAN-09-1133] [PMID: 19654296]
[http://dx.doi.org/10.1016/j.neucom.2012.10.012]
[PMID: 29275361]
[http://dx.doi.org/10.2174/1386207323666200306125538] [PMID: 32141416]
[http://dx.doi.org/10.1007/s11222-016-9646-1]
[http://dx.doi.org/10.2174/1386207319666161227124350] [PMID: 28029071]
[http://dx.doi.org/10.1016/j.ifacol.2016.08.115]
[http://dx.doi.org/10.1074/mcp.M110.006536]
[http://dx.doi.org/10.1016/j.ins.2016.06.026]
[http://dx.doi.org/10.1039/C6MB00599C] [PMID: 27759121]
[http://dx.doi.org/10.1016/j.omtn.2018.03.001] [PMID: 29858068]
[http://dx.doi.org/10.1002/minf.201600010] [PMID: 28488814]
[http://dx.doi.org/10.1093/nar/29.14.2994] [PMID: 11452024]
[http://dx.doi.org/10.1093/nar/28.1.45] [PMID: 10592178]
[http://dx.doi.org/10.1002/prot.23174] [PMID: 21987472]
[http://dx.doi.org/10.1016/j.sbi.2009.02.005] [PMID: 19327982]
[http://dx.doi.org/10.1093/nar/gks372] [PMID: 22570420]
[http://dx.doi.org/10.1016/j.neucom.2016.02.022]
[http://dx.doi.org/10.1016/j.ygeno.2013.05.006] [PMID: 23747746]
[http://dx.doi.org/10.1186/1471-2105-15-297] [PMID: 25189131]
[http://dx.doi.org/10.2174/1386207322666190925125524] [PMID: 31553288]
[http://dx.doi.org/10.1007/s00726-011-0959-2] [PMID: 21695537]
[http://dx.doi.org/10.1016/j.jtbi.2011.09.026] [PMID: 22001079]
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID: 10869041]
[http://dx.doi.org/10.1002/prot.10528] [PMID: 14579348]
[http://dx.doi.org/10.1016/S1093-3263(00)00138-8] [PMID: 11381529]
[http://dx.doi.org/10.1093/bioinformatics/bth195] [PMID: 15044227]
[http://dx.doi.org/10.1080/07391102.2016.1196463] [PMID: 27499354]
[PMID: 31067315]
[http://dx.doi.org/10.1155/2016/1654623]
[http://dx.doi.org/10.1016/j.ins.2016.01.033]
[http://dx.doi.org/10.1016/j.patcog.2019.02.023]
[http://dx.doi.org/10.1371/journal.pone.0177678] [PMID: 28574989]
[http://dx.doi.org/10.1186/s12864-019-6413-7] [PMID: 31898477]
[http://dx.doi.org/10.1111/ecog.02881]
[http://dx.doi.org/10.1186/1479-7364-8-11] [PMID: 24980617]
[http://dx.doi.org/10.1002/jcc.23219] [PMID: 23288787]
[http://dx.doi.org/10.1186/s13040-014-0031-3] [PMID: 26478747]
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID: 29528364]
[http://dx.doi.org/10.1093/nar/gky497] [PMID: 29893907]
[http://dx.doi.org/10.1093/nar/28.1.352] [PMID: 10592272]
[http://dx.doi.org/10.1093/bioinformatics/btl423] [PMID: 16895930]
[http://dx.doi.org/10.1093/nar/gkg095] [PMID: 12520024]