Abstract
Objective: Cancer is one of the most serious diseases affecting human health. Among all current cancer treatments, early diagnosis and control significantly help increase the chances of cure. Detecting cancer biomarkers in body fluids now is attracting more attention within oncologists. In-silico predictions of body fluid-related proteins, which can be served as cancer biomarkers, open a door for labor-intensive and time-consuming biochemical experiments.
Methods: In this work, we propose a novel method for high-throughput identification of cancer biomarkers in human body fluids. We incorporate physicochemical properties into the weighted observed percentages (WOP) and position-specific scoring matrices (PSSM) profiles to enhance their attributes that reflect the evolutionary conservation of the body fluid-related proteins. The least absolute selection and shrinkage operator (LASSO) feature selection strategy is introduced to generate the optimal feature subset.
Results: The ten-fold cross-validation results on training datasets demonstrate the accuracy of the proposed model. We also test our proposed method on independent testing datasets and apply it to the identification of potential cancer biomarkers in human body fluids.
Conclusion: The testing results promise a good generalization capability of our approach.
Keywords: Cancer biomarkers, Body fluid, Evolutionary conservation, Physicochemical properties, LASSO, PSSM.
Graphical Abstract
[http://dx.doi.org/10.1001/jamaoncol.2016.5688] [PMID: 27918777]
[http://dx.doi.org/10.1093/annonc/mdq546] [PMID: 21109571]
[http://dx.doi.org/10.1073/pnas.0530278100] [PMID: 12624183]
[http://dx.doi.org/10.2174/138161210789941847] [PMID: 20214614]
[http://dx.doi.org/10.1002/prca.200700217] [PMID: 21136753]
[http://dx.doi.org/10.1038/nature06916] [PMID: 18385731]
[http://dx.doi.org/10.1002/ijc.22298] [PMID: 17096339]
[http://dx.doi.org/10.1158/1078-0432.CCR-16-1203] [PMID: 27535982]
[http://dx.doi.org/10.1016/j.hbpd.2018.04.001] [PMID: 29752133]
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009] [PMID: 30772464]
[http://dx.doi.org/10.3390/ijms19020536]]
[http://dx.doi.org/10.1002/jcc.21016] [PMID: 18478581]
[http://dx.doi.org/10.1371/journal.pone.0080211] [PMID: 24324552]
[http://dx.doi.org/10.1002/minf.201300027] [PMID: 27485774]
[http://dx.doi.org/10.1109/TNB.2015.2395143] [PMID: 25675464]
[http://dx.doi.org/10.3389/fgene.2019.00542] [PMID: 31244885]
[http://dx.doi.org/10.1098/rsif.2014.0419] [PMID: 25165599]
[http://dx.doi.org/10.1073/pnas.0403255101] [PMID: 15197260]
[http://dx.doi.org/10.1016/j.sbi.2016.10.020] [PMID: 27865208]
[http://dx.doi.org/10.1039/C6MB00626D] [PMID: 27730230]
[http://dx.doi.org/10.1186/s12859-017-1709-6] [PMID: 28583090]
[http://dx.doi.org/10.1016/j.ymeth.2016.07.005] [PMID: 27402354]
[http://dx.doi.org/10.1109/TCBB.2016.2615010] [PMID: 28029626]
[PMID: 29993782]
[http://dx.doi.org/10.1016/j.jtbi.2014.08.002] [PMID: 25123433]
[http://dx.doi.org/10.1016/j.jtbi.2012.10.015] [PMID: 23123454]
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[http://dx.doi.org/10.1038/nmeth.3213] [PMID: 25549265]
[http://dx.doi.org/10.1038/nature15363] [PMID: 26444241]
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID: 10869041]
[http://dx.doi.org/10.3389/fgene.2018.00515] [PMID: 30459809]
[http://dx.doi.org/10.1002/pmic.201800064] [PMID: 29806170]
[http://dx.doi.org/10.2174/157489361403190220112855]
[http://dx.doi.org/10.1145/3136625]
[http://dx.doi.org/10.3389/fmicb.2019.00827] [PMID: 31057526]
[http://dx.doi.org/10.1093/bioinformatics/btz246] [PMID: 30994882]