Abstract
Background: Despite all the medical advances introduced for personalized patient treatment and the research supported in search of genetic patterns inherent to the occurrence of its different manifestations on the human being, the unequivocal and effective treatment of cancer, unfortunately, remains as an unresolved challenge within the scientific panorama. Until a universal solution for its control is achieved, early detection mechanisms for preventative diagnosis increasingly avoid treatments, resulting in unreliable effectiveness. The discovery of unequivocal gene patterns allowing us to discern between multiple pathological states could help shed light on patients suspected of an oncological disease but with uncertainty in the histological and immunohistochemical results.
Methods: This study presents an approach for pan-cancer diagnosis based on gene expression analysis that determines a reduced set of 12 genes, making it possible to distinguish between the main 14 cancer diseases.
Results: Our cascade machine learning process has been robustly designed, obtaining a mean F1 score of 92% and a mean AUC of 99.37% in the test set. Our study showed heterogeneous over-or underexpression of the analyzed genes, which can act as oncogenes or tumor suppressor genes. Upregulation of LPAR5 and PAX8 was demonstrated in thyroid cancer samples. KLF5 was highly expressed in the majority of cancer types.
Conclusion: Our model constituted a useful tool for pan-cancer gene expression evaluation. In addition to providing biological clues about a hypothetical common origin of cancer, the scalability of this study promises to be very useful for future studies to reinforce, confirm, and extend the biological observations presented here. Code availability and datasets are stored in the following GitHub repository to aim for the research reproducibility: https://github.com/CasedUgr/PanCancerClassification.
Keywords: PanCancer, RNA-Seq, TCGA, Gene Expression, Machine Learning, Feature Selection, CDSS
Graphical Abstract
[http://dx.doi.org/10.1093/bib/bbx144] [PMID: 29077790]
[http://dx.doi.org/10.1016/j.ijmedinf.2019.07.019] [PMID: 31450082]
[http://dx.doi.org/10.1016/j.csbj.2014.11.005] [PMID: 25750696]
[http://dx.doi.org/10.1016/j.compbiomed.2019.103375] [PMID: 31382212]
[http://dx.doi.org/10.1093/nar/gks1288] [PMID: 23262226]
[http://dx.doi.org/10.1093/bib/bbab235]
[http://dx.doi.org/10.1016/j.eswa.2019.112951]
[http://dx.doi.org/10.1016/S0306-4379(02)00072-8]
[http://dx.doi.org/10.1038/nm.3175] [PMID: 23584089]
[http://dx.doi.org/10.1186/s12864-017-3906-0] [PMID: 28673244]
[http://dx.doi.org/10.1038/nature25795] [PMID: 29489755]
[http://dx.doi.org/10.1038/srep13413] [PMID: 26292924]
[http://dx.doi.org/10.1186/s12859-016-1421-y] [PMID: 28086747]
[http://dx.doi.org/10.1186/s12859-017-1925-0] [PMID: 29157215]
[http://dx.doi.org/10.1371/journal.pone.0212127] [PMID: 30753220]
[http://dx.doi.org/10.1109/JBHI.2019.2953978] [PMID: 31871000]
[http://dx.doi.org/10.1038/ng.2764] [PMID: 24071849]
[http://dx.doi.org/10.1016/j.compbiomed.2021.104387] [PMID: 33872966]
[http://dx.doi.org/10.1093/bioinformatics/bts635] [PMID: 23104886]
[http://dx.doi.org/10.1093/bioinformatics/btu638] [PMID: 25260700]
[http://dx.doi.org/10.1093/biostatistics/kxr054] [PMID: 22285995]
[http://dx.doi.org/10.1093/nar/gkv007] [PMID: 25605792]
[http://dx.doi.org/10.1186/s13059-014-0550-8] [PMID: 25516281]
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
[http://dx.doi.org/10.1109/TIT.1967.1053964]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.1158/0008-5472.CAN-04-4531] [PMID: 16266996]
[http://dx.doi.org/10.1186/bcr1650] [PMID: 17280616]
[http://dx.doi.org/10.1080/14728222.2017.1363183] [PMID: 28764577]
[http://dx.doi.org/10.1016/j.ebiom.2019.102609] [PMID: 31915116]
[http://dx.doi.org/10.1182/blood.2019000381] [PMID: 31434700]
[http://dx.doi.org/10.7150/jca.47157] [PMID: 33033513]
[http://dx.doi.org/10.1002/ijc.26328] [PMID: 21796630]
[http://dx.doi.org/10.1016/j.mce.2018.10.021] [PMID: 30391671]
[http://dx.doi.org/10.3892/ijmm.2018.3926] [PMID: 30320379]
[http://dx.doi.org/10.1007/s11010-017-3059-0] [PMID: 28551845]
[PMID: 31815038]
[http://dx.doi.org/10.1038/s41388-020-01419-4] [PMID: 32801337]
[http://dx.doi.org/10.1016/j.bcp.2019.113772] [PMID: 31866302]
[http://dx.doi.org/10.1016/j.phrs.2017.11.032] [PMID: 29199082]
[http://dx.doi.org/10.1038/s41598-017-03647-3] [PMID: 28607412]
[PMID: 29435071]
[PMID: 29207117]
[http://dx.doi.org/10.1158/1078-0432.CCR-05-1142] [PMID: 16278382]
[http://dx.doi.org/10.3390/cancers11070938] [PMID: 31277414]
[http://dx.doi.org/10.1007/s13277-012-0462-8] [PMID: 22836805]
[http://dx.doi.org/10.1186/s12885-015-1530-4] [PMID: 26169495]
[http://dx.doi.org/10.1097/NEN.0b013e31826bf704] [PMID: 22964784]
[http://dx.doi.org/10.2147/CMAR.S229013] [PMID: 32273754]
[http://dx.doi.org/10.18632/aging.202151] [PMID: 33234727]
[http://dx.doi.org/10.12659/MSM.919820]
[http://dx.doi.org/10.3892/ol.2019.10885] [PMID: 31612022]
[http://dx.doi.org/10.2147/CMAR.S243748] [PMID: 32547225]
[http://dx.doi.org/10.3389/fonc.2020.01214] [PMID: 32983960]
[http://dx.doi.org/10.1007/s13277-015-4502-z] [PMID: 26628301]
[http://dx.doi.org/10.3892/ol.2019.10949] [PMID: 31788076]
[http://dx.doi.org/10.3892/or.2019.7031] [PMID: 30816546]
[http://dx.doi.org/10.12659/MSMBR.915067] [PMID: 31040265]
[http://dx.doi.org/10.1186/s12943-018-0824-y] [PMID: 29625565]
[http://dx.doi.org/10.1002/jgm.3127] [PMID: 31693770]
[http://dx.doi.org/10.2174/1568026615666150302105052] [PMID: 25732792]
[PMID: 32373958]
[http://dx.doi.org/10.1074/jbc.RA119.009102]
[http://dx.doi.org/10.1186/s12935-020-01332-6] [PMID: 32607060]
[http://dx.doi.org/10.1002/jcp.28831]
[http://dx.doi.org/10.7150/thno.44278] [PMID: 32929371]
[http://dx.doi.org/10.1038/nrendo.2011.142] [PMID: 21878896]
[http://dx.doi.org/10.1038/s41467-020-15951-0] [PMID: 32332753]
[http://dx.doi.org/10.1186/s12860-019-0245-9] [PMID: 31881968]
[http://dx.doi.org/10.1172/JCI81516] [PMID: 27018596]
[http://dx.doi.org/10.1371/journal.pone.0039469] [PMID: 22724020]
[http://dx.doi.org/10.1038/srep36902] [PMID: 27841306]
[http://dx.doi.org/10.1038/pcan.2012.17] [PMID: 22665141]
[http://dx.doi.org/10.1038/s41419-019-1830-8] [PMID: 31399560]
[http://dx.doi.org/10.4103/0377-4929.85078] [PMID: 21934206]
[http://dx.doi.org/10.1245/s10434-019-08189-8]
[http://dx.doi.org/10.2147/CMAR.S184368] [PMID: 30799948]