Abstract
Background: Bioluminescence is a unique and significant phenomenon in nature. Bioluminescence is important for the lifecycle of some organisms and is valuable in biomedical research, including for gene expression analysis and bioluminescence imaging technology. In recent years, researchers have identified a number of methods for predicting bioluminescent proteins (BLPs), which have increased in accuracy, but could be further improved.
Methods: In this study, a new bioluminescent proteins prediction method, based on a voting algorithm, is proposed. Four methods of feature extraction based on the amino acid sequence were used. 314 dimensional features in total were extracted from amino acid composition, physicochemical properties and k-spacer amino acid pair composition. In order to obtain the highest MCC value to establish the optimal prediction model, a voting algorithm was then used to build the model. To create the best performing model, the selection of base classifiers and vote counting rules are discussed.
Results and Conclusion: The proposed model achieved 93.4% accuracy, 93.4% sensitivity and 91.7% specificity in the test set, which was better than any other method. A previous prediction of bioluminescent proteins in three lineages was also improved using the model building method, resulting in greatly improved accuracy.
Keywords: Bioluminescent proteins, prediction, feature extraction, voting algorithm, base classifiers, vote counting rules.
Graphical Abstract
[http://dx.doi.org/10.1126/science.1174269] [PMID: 20448176]
[http://dx.doi.org/10.1016/j.bbapap.2013.09.022] [PMID: 24103420]
[http://dx.doi.org/10.1146/annurev.cellbio.14.1.197] [PMID: 9891783]
[http://dx.doi.org/10.1146/annurev.bioeng.4.111901.093336]] [PMID: 12117758]
[http://dx.doi.org/10.1007/978-1-4939-9940-8_1] [PMID: 31721114]
[http://dx.doi.org/10.1016/j.neucom.2017.11.061]
[http://dx.doi.org/10.1016/j.patcog.2017.09.036]
[http://dx.doi.org/10.1016/j.patcog.2017.01.016]
[http://dx.doi.org/10.1186/1471-2105-12-345] [PMID: 21849049]
[http://dx.doi.org/10.3390/ijms13033650] [PMID: 22489173]
[http://dx.doi.org/10.1093/nar/gkz740] [PMID: 31504851]
[http://dx.doi.org/10.1016/j.jtbi.2013.06.003] [PMID: 23770403]
[http://dx.doi.org/10.1371/journal.pone.0097158] [PMID: 24828431]
[http://dx.doi.org/10.1016/j.compbiomed.2015.10.013] [PMID: 26599828]
[http://dx.doi.org/10.1186/s12859-017-1709-6] [PMID: 28583090]
[PMID: 25348405]
[http://dx.doi.org/ 10.1109/ACCESS.2019.2938081]
[http://dx.doi.org/10.3390/molecules23102633]
[http://dx.doi.org/10.1006/jmbi.1994.1267] [PMID: 8145256]
[http://dx.doi.org/10.2174/1574893613666181113131415]
[http://dx.doi.org/10.1186/s12859-018-2009-5] [PMID: 29334889]
[http://dx.doi.org/10.1002/prot.22898] [PMID: 21069866]
[http://dx.doi.org/10.1093/bib/bbx165] [PMID: 29272359]
[http://dx.doi.org/10.1016/j.knosys.2018.10.007]
[http://dx.doi.org/10.2174/1389450119666181002143355] [PMID: 30277150]
[http://dx.doi.org/10.1186/1477-5956-10-S1-S20] [PMID: 22759579]
[http://dx.doi.org/10.1093/bioinformatics/btz040] [PMID: 30668845]
[http://dx.doi.org/10.1155/2013/686090] [PMID: 24027761]
[http://dx.doi.org/10.1016/j.chemolab.2018.07.006]
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID: 29528364]
[http://dx.doi.org/10.3934/mbe.2019123] [PMID: 31137222]
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968]
[http://dx.doi.org/10.1016/j.jtbi.2018.11.012] [PMID: 30452958]
[http://dx.doi.org/10.1016/j.neucom.2018.10.028]
[http://dx.doi.org/10.1186/1755-8794-8-S2-S2] [PMID: 26044949]
[http://dx.doi.org/10.1007/s00726-014-1862-4] [PMID: 25385313]
[http://dx.doi.org/10.1504/IJDMB.2013.056078] [PMID: 24417022]
[http://dx.doi.org/10.1371/journal.pone.0011794] [PMID: 20668688]
[http://dx.doi.org/10.1186/1471-2164-9-S2-S22] [PMID: 18831788]
[http://dx.doi.org/10.1109/TCBB.2018.2858756] [PMID: 30040651]
[http://dx.doi.org/10.1021/acs.jproteome.7b00019] [PMID: 28436664]
[http://dx.doi.org/10.1093/bioinformatics/bty451] [PMID: 29868903]
[http://dx.doi.org/10.3390/genes9030158] [PMID: 29534013]
[http://dx.doi.org/10.1016/j.omtn.2019.11.014] [PMID: 31865116]
[http://dx.doi.org/10.1093/bfgp/elz018] [PMID: 31609411]
[http://dx.doi.org/10.1093/bib/bbz152] [PMID: 31885041]
[http://dx.doi.org/10.3389/fmicb.2018.02571] [PMID: 30416498]
[http://dx.doi.org/10.1016/j.omtn.2019.08.008] [PMID: 31536883]
[http://dx.doi.org/10.3389/fgene.2019.00842] [PMID: 31620165]
[http://dx.doi.org/10.1093/bioinformatics/bty995] [PMID: 30520961]
[http://dx.doi.org/10.1109/TCYB.2017.2779450] [PMID: 29990272]
[http://dx.doi.org/10.1109/TCYB.2018.2856208] [PMID: 30059330]
[http://dx.doi.org/10.1093/bioinformatics/bth261] [PMID: 15073010]
[http://dx.doi.org/10.1109/24.370218]
[http://dx.doi.org/10.1016/j.omtn.2019.07.019] [PMID: 31479921]
[http://dx.doi.org/10.1109/TRO.2011.2127110]
[http://dx.doi.org/10.1007/s11042-019-08191-y]
[http://dx.doi.org/10.2196/15601] [PMID: 31746764]
[http://dx.doi.org/10.3389/fgene.2018.00745] [PMID: 30713550]
[http://dx.doi.org/10.7150/ijbs.23350] [PMID: 29989066]
[http://dx.doi.org/10.1016/j.omtn.2018.07.004] [PMID: 30081234]
[http://dx.doi.org/10.1371/journal.pcbi.1005420] [PMID: 28594838]
[http://dx.doi.org/10.1016/j.artmed.2017.03.001] [PMID: 28320624]
[http://dx.doi.org/10.1109/ACCESS.2019.2929363]
[http://dx.doi.org/10.1109/TCYB.2019.2938895] [PMID: 31545758]
[http://dx.doi.org/10.1093/bib/bbz125] [PMID: 31729524]
[http://dx.doi.org/10.1186/s12920-017-0313-y] [PMID: 29297351]
[http://dx.doi.org/10.1109/TCBB.2016.2520947] [PMID: 26890920]
[http://dx.doi.org/10.1109/TCDS.2017.2785332]
[http://dx.doi.org/10.1186/s12918-016-0364-2] [PMID: 28155709]
[http://dx.doi.org/10.1109/36.763300]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1016/j.ins.2017.08.045]
[http://dx.doi.org/10.1016/j.artmed.2017.03.009] [PMID: 28545612]
[http://dx.doi.org/10.3389/fgene.2019.00033] [PMID: 30809242]
[http://dx.doi.org/10.1023/A:1012474916001]
[http://dx.doi.org/10.1145/1961189.1961199]
[http://dx.doi.org/10.18632/oncotarget.7815] [PMID: 26942877]
[http://dx.doi.org/10.3390/ijms17101623] [PMID: 27669239]
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492]
[http://dx.doi.org/10.1155/2015/861402] [PMID: 26425556]
[http://dx.doi.org/10.3390/molecules23123140] [PMID: 30501121]
[http://dx.doi.org/10.3390/ijms19061773] [PMID: 29914044]
[http://dx.doi.org/10.1093/bib/bbz133] [PMID: 31774907]
[http://dx.doi.org/10.1093/bib/bbz098] [PMID: 31665221]
[http://dx.doi.org/10.2174/1574893609666140820224436]
[http://dx.doi.org/10.1074/jbc.273.4.1843] [PMID: 9442013]
[http://dx.doi.org/10.1109/EMBC.2016.7591380]
[http://dx.doi.org/10.1109/TPAMI.2011.82] [PMID: 21576745]
[http://dx.doi.org/10.1007/s10021-005-0054-1]
[http://dx.doi.org/10.1109/TKDE.2006.180]
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187]
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[http://dx.doi.org/10.1093/nar/gkx1096] [PMID: 29145608]
[http://dx.doi.org/10.2174/1574893614666181212102030]
[http://dx.doi.org/10.1109/TCBB.2019.2952338] [PMID: 31722485]
[http://dx.doi.org/10.1093/bioinformatics/btz418] [PMID: 31116390]
[http://dx.doi.org/10.1109/TCBB.2019.2907536] [PMID: 30932845]
[http://dx.doi.org/10.1093/nar/gky1051] [PMID: 30380072]
[http://dx.doi.org/10.2174/1574893612666170905153933]
[http://dx.doi.org/10.1109/TCBB.2016.2550432] [PMID: 27076459]