Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Multi-label Learning for the Diagnosis of Cancer and Identification of Novel Biomarkers with High-throughput Omics

Author(s): Shicai Liu, Hailin Tang, Hongde Liu and Jinke Wang*

Volume 16, Issue 2, 2021

Published on: 23 June, 2020

Page: [261 - 273] Pages: 13

DOI: 10.2174/1574893615999200623130416

Price: $65

Abstract

Background: The advancement of bioinformatics and machine learning has facilitated the diagnosis of cancer and the discovery of omics-based biomarkers.

Objective: Our study employed a novel data-driven approach to classifying the normal samples and different types of gastrointestinal cancer samples, to find potential biomarkers for effective diagnosis and prognosis assessment of gastrointestinal cancer patients.

Methods: Different feature selection methods were used, and the diagnostic performance of the proposed biosignatures was benchmarked using support vector machine (SVM) and random forest (RF) models.

Results: All models showed satisfactory performance in which Multilabel-RF appeared to be the best. The accuracy of the Multilabel-RF based model was 83.12%, with precision, recall, F1, and Hamming- Loss of 79.70%, 68.31%, 0.7357 and 0.1688, respectively. Moreover, proposed biomarker signatures were highly associated with multifaceted hallmarks in cancer. Functional enrichment analysis and impact of the biomarker candidates in the prognosis of the patients were also examined.

Conclusion: We successfully introduced a solid workflow based on multi-label learning with High- Throughput Omics for diagnosis of cancer and identification of novel biomarkers. Novel transcriptome biosignatures that may improve the diagnostic accuracy in gastrointestinal cancer are introduced for further validations in various clinical settings.

Keywords: Gastrointestinal cancer, machine learning, multi-label learning, transcriptomics, diagnostic biomarkers, omics.

Graphical Abstract

[1]
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin 2019; 69(1): 7-34.
[http://dx.doi.org/10.3322/caac.21551] [PMID: 30620402]
[2]
Chen W, Zheng R, Baade PD, et al. Cancer statistics in China, 2015. CA Cancer J Clin 2016; 66(2): 115-32.
[http://dx.doi.org/10.3322/caac.21338] [PMID: 26808342]
[3]
Ferlay J, Colombet M, Soerjomataram I, et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int J Cancer 2019; 144(8): 1941-53.
[http://dx.doi.org/10.1002/ijc.31937] [PMID: 30350310]
[4]
Zhu J, Tan Z, Hollis-Hansen K, Zhang Y, Yu C, Li Y. Epidemiological trends in colorectal cancer in China: An ecological study. Dig Dis Sci 2017; 62(1): 235-43.
[http://dx.doi.org/10.1007/s10620-016-4362-4] [PMID: 27796769]
[5]
Xi L, Zhu J, Zhang H, Muktiali M, Xu C, Wu A. Epidemiological trends in gastrointestinal cancers in China: An ecological study. Dig Dis Sci 2019; 64(2): 532-43.
[http://dx.doi.org/10.1007/s10620-018-5335-6] [PMID: 30350242]
[6]
Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin 2015; 65(2): 87-108.
[http://dx.doi.org/10.3322/caac.21262] [PMID: 25651787]
[7]
Bhardwaj M, Gies A, Werner S, Schrotz-King P, Brenner H. Blood-based protein signatures for early detection of colorectal cancer: A systematic review. Clin Transl Gastroenterol 2017; 8(11)e128
[http://dx.doi.org/10.1038/ctg.2017.53] [PMID: 29189767]
[8]
Sun K, Han R, Han Y, Shi X, Hu J, Lu B. Accuracy of combined computed tomography colonography and dual energy iiodine map imaging for detecting colorectal masses using high-pitch dual-source CT. Sci Rep 2018; 8(1): 3790.
[http://dx.doi.org/10.1038/s41598-018-22188-x] [PMID: 29491380]
[9]
Nagata K, Endo S, Honda T, et al. Accuracy of CT colonography for detection of polypoid and nonpolypoid neoplasia by gastroenterologists and radiologists: A nationwide multicenter study in Japan. Am J Gastroenterol 2017; 112(1): 163-71.
[http://dx.doi.org/10.1038/ajg.2016.478] [PMID: 27779195]
[10]
Trosman JR, Weldon CB, Gradishar WJ, et al. From the past to the present: Insurer coverage frameworks for next-generation tumor sequencing. Value Health 2018; 21(9): 1062-8.
[http://dx.doi.org/10.1016/j.jval.2018.06.011] [PMID: 30224110]
[11]
Long NP, Jung KH, Yoon SJ, et al. Systematic assessment of cervical cancer initiation and progression uncovers genetic panels for deep learning-based early diagnosis and proposes novel diagnostic and prognostic biomarkers. Oncotarget 2017; 8(65): 109436-56.
[http://dx.doi.org/10.18632/oncotarget.22689] [PMID: 29312619]
[12]
Romero IG, Ruvinsky I, Gilad Y. Comparative studies of gene expression and the evolution of gene regulation. Nat Rev Genet 2012; 13(7): 505-16.
[http://dx.doi.org/10.1038/nrg3229] [PMID: 22705669]
[13]
Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008; 455(7216): 1061-8.
[http://dx.doi.org/10.1038/nature07385] [PMID: 18772890]
[14]
Wang Z, Jensen MA, Zenklusen JC. A Practical Guide to The Cancer Genome Atlas (TCGA). Methods Mol Biol 2016; 1418: 111-41.
[http://dx.doi.org/ 10.1007/978-1-4939-3578-9_6] [PMID: 27008012]
[15]
Ko J, Baldassano SN, Loh PL, Kording K, Litt B, Issadore D. Machine learning to detect signatures of disease in liquid biopsies - a user’s guide. Lab Chip 2018; 18(3): 395-405.
[http://dx.doi.org/10.1039/C7LC00955K] [PMID: 29192299.]
[16]
Anzar I, Sverchkova A, Stratford R, Clancy T. NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer. BMC Med Genomics 2019; 12(1): 63.
[http://dx.doi.org/10.1186/s12920-019-0508-5] [PMID: 31096972]
[17]
Zhang Z, Pan Z, Ying Y, et al. Deep-learning augmented RNA-seq analysis of transcript splicing. Nat Methods 2019; 16(4): 307-10.
[http://dx.doi.org/10.1038/s41592-019-0351-9] [PMID: 30923373]
[18]
Warzecha CC, Jiang P, Amirikian K, et al. An ESRP-regulated splicing programme is abrogated during the epithelial-mesenchymal transition. EMBO J 2010; 29(19): 3286-300.
[http://dx.doi.org/10.1038/emboj.2010.195] [PMID: 20711167]
[19]
Wang L, Jia Y, Jiang Z, Gao W, Wang B. FSCN1 is upregulated by SNAI2 and promotes epithelial to mesenchymal transition in head and neck squamous cell carcinoma. Cell Biol Int 2017; 41(8): 833-41.
[http://dx.doi.org/10.1002/cbin.10786] [PMID: 28488774]
[20]
Liu J, Wu Y, Wang Q, Liu X, Liao X, Pan J. Bioinformatic analysis of PFN2 dysregulation and its prognostic value in head and neck squamous carcinoma. Future Oncol 2018; 14(5): 449-59.
[http://dx.doi.org/10.2217/fon-2017-0348] [PMID: 29322815]
[21]
Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 2018; 24(6): 1248-59.
[http://dx.doi.org/10.1158/1078-0432.CCR-17-0853] [PMID: 28982688]
[22]
Kim SY, Kim TR, Jeong HH, Sohn KA. Integrative pathway-based survival prediction utilizing the interaction between gene expression and DNA methylation in breast cancer. BMC Med Genomics 2018; 11(Suppl. 3): 68.
[http://dx.doi.org/10.1186/s12920-018-0389-z] [PMID: 30255812]
[23]
Krempel R, Kulkarni P, Yim A, Lang U, Habermann B, Frommolt P. Integrative analysis and machine learning on cancer genomics data using the cancer systems biology database (CancerSysDB). BMC Bioinformatics 2018; 19(1): 156.
[http://dx.doi.org/10.1186/s12859-018-2157-7] [PMID: 29699486]
[24]
Hu S, Yuan H, Li Z, et al. Transcriptional response profiles of paired tumor-normal samples offer novel perspectives in pan-cancer analysis. Oncotarget 2017; 8(25): 41334-47.
[http://dx.doi.org/10.18632/oncotarget.17295] [PMID: 28489584]
[25]
Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[26]
Laurens VDM, Geoffrey H. Visualizing Data using t-SNE. J Mach Learn Res 2008; 9: 2579-605.
[27]
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014; 15(12): 550.
[http://dx.doi.org/10.1186/s13059-014-0550-8] [PMID: 25516281”.]
[28]
Chang C-C, Lin C-J. Libsvm. ACM Trans Intell Syst Technol 2011; 2(3): 1-27.
[http://dx.doi.org/10.1145/1961189.1961199]
[29]
Chou K-C. Using subsite coupling to predict signal peptides. Protein Eng 2001; 14(2): 75-9.
[http://dx.doi.org/10.1093/protein/14.2.75] [PMID: 11297664]
[30]
Song J, Wang Y, Li F, et al. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform 2019; 20(2): 638-58.
[http://dx.doi.org/10.1093/bib/bby028] [PMID: 29897410]
[31]
Lin WZ, Fang JA, Xiao X, Chou KC. iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol Biosyst 2013; 9(4): 634-44.
[http://dx.doi.org/10.1039/c3mb25466f] [PMID: 23370050]
[32]
Grigorios T, Ioannis K. Multi-label classification: an overview. Int J Data Warehous Min 2007; 3(3): 1-13.
[http://dx.doi.org/10.4018/jdwm.2007070101]
[33]
Tsoumakas G, Katakis I, Vlahavas I. Min multi-label data. Data Min Knowl Discov 2009; 667-85.
[http://dx.doi.org/10.1007/978-0-387-09823-4_34]
[34]
Chou KC. Some remarks on predicting multi-label attributes in molecular biosystems. Mol Biosyst 2013; 9(6): 1092-100.
[http://dx.doi.org/10.1039/c3mb25555g] [PMID: 23536215]
[35]
Tang Z, Li C, Kang B, Gao G, Li C, Zhang Z. GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses. Nucleic Acids Res 2017; 45(W1): W98-W102.
[http://dx.doi.org/10.1093/nar/gkx247] [PMID: 28407145]
[36]
Ashburner M, Ball CA, Blake JA, et al. The gene ontology consortium. Gene ontology: tool for the unification of biology. Nat Genet 2000; 25(1): 25-9.
[http://dx.doi.org/10.1038/75556] [PMID: 10802651]
[37]
Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009; 4(1): 44-57.
[http://dx.doi.org/10.1038/nprot.2008.211] [PMID: 19131956]
[38]
Baker S, Ali I, Silins I, et al. Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer. Bioinformatics 2017; 33(24): 3973-81.
[http://dx.doi.org/10.1093/bioinformatics/btx454] [PMID: 29036271]
[39]
Rung J, Brazma A. Reuse of public genome-wide gene expression data. Nat Rev Genet 2013; 14(2): 89-99.
[http://dx.doi.org/10.1038/nrg3394] [PMID: 23269463]
[40]
Chang YT, Yao CT, Su SL, et al. Verification of gene expression profiles for colorectal cancer using 12 internet public microarray datasets. World J Gastroenterol 2014; 20(46): 17476-82.
[http://dx.doi.org/10.3748/wjg.v20.i46.17476] [PMID: 25516661]
[41]
Garcia-Bilbao A, Armananzas R, Ispizua Z, et al. Identification of a biomarker panel for colorectal cancer diagnosis. BMC Cancer 2012; 12(43)
[http://dx.doi.org/10.1186/1471-2407-12-43]
[42]
Ren Q, Li B, Liu M, Hu Z, Wang Y. Prognostic value of NEK2 overexpression in digestive system cancers: a meta-analysis and systematic review. OncoTargets Ther 2018; 11: 7169-8.
[http://dx.doi.org/10.2147/OTT.S169911]
[43]
Talantov D, Baden J, Jatkoe T, et al. A quantitative reverse transcriptase-polymerase chain reaction assay to identify metastatic carcinoma tissue of origin. J Mol Diagn 2006; 8(3): 320-9.
[http://dx.doi.org/10.2353/jmoldx.2006.050136] [PMID: 16825504]
[44]
Varadhachary GR, Talantov D, Raber MN, et al. Molecular profiling of carcinoma of unknown primary and correlation with clinical evaluation. J Clin Oncol 2008; 26(27): 4442-8.
[http://dx.doi.org/10.1200/JCO.2007.14.4378] [PMID: 18802157]
[45]
Oien KA, Dennis JL. Diagnostic work-up of carcinoma of unknown primary: from immunohistochemistry to molecular profiling. Ann Oncol 2012; 23(10): 271-7.
[http://dx.doi.org/10.1093/annonc/mds357]
[46]
Long NP, Yoon SJ, Anh NH, et al. A systematic review on metabolomics-based diagnostic biomarker discovery and validation in pancreatic cancer. Metabolomics 2018; 14(8): 109.
[http://dx.doi.org/10.1007/s11306-018-1404-2] [PMID: 30830397]
[47]
Ribeiro MT, Singh S, Guestrin C. Why Should I Trust You? Explaining the Predictions of Any Classifier KDD '16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135-44..
[http://dx.doi.org/10.1145/2939672.2939778]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy