Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Feature-scML: An Open-source Python Package for the Feature Importance Visualization of Single-Cell Omics with Machine Learning

Author(s): Pengfei Liang, Hao Wang, Yuchao Liang, Jian Zhou, Haicheng Li and Yongchun Zuo*

Volume 17, Issue 7, 2022

Published on: 27 August, 2022

Page: [578 - 585] Pages: 8

DOI: 10.2174/1574893617666220608123804

Price: $65

Abstract

Background: Inferring feature importance is both a promise and challenge in bioinformatics and computational biology. While multiple biological computation methods exist to identify decisive factors of single cell subpopulation, there is a need for a comprehensive toolkit that presents an intuitive and custom view of the feature importance.

Objective: We developed a Feature-scML, a scalable and friendly toolkit that allows the users to visualize and reveal decisive factors for single-cell omics analysis.

Methods: Feature-scML incorporates the following three main functions: (i) There are seven feature selection algorithms to comprehensively score and rank every feature. (ii) Four machine learning approaches and increment feature selection (IFS) strategy jointly determine the number of selected features. (iii) The Feature-scML supports the visualized feature importance, model performance evaluation, and model interpretation. The source code is available at https://github.com/liameihao/Feature-scML.

Results: We systematically compared the performance of seven feature selection algorithms from Feature- scML on two single cell transcriptome datasets. It demonstrates the effectiveness and power of the Feature-scML.

Conclusion: Feature-scML is effective for analyzing single-cell RNA omics datasets to automate the machine learning process and customize the visual analysis from the results.

Keywords: Feature ranking, bioinformatics, machine learning, python, feature selection, visualization

Graphical Abstract

[1]
He S, Guo F, Zou Q, Ding H. MRMD2. 0: A python tool for machine learning with feature ranking and reduction. Curr Bioinform 2020; 15(10): 1213-21.
[http://dx.doi.org/10.2174/1574893615999200503030350]
[2]
Masoudi-Sobhanzadeh Y, Motieghader H, Masoudi-Nejad A. FeatureSelect: A software for feature selection based on machine learning approaches. BMC Bioinformatics 2019; 20(1): 170.
[http://dx.doi.org/10.1186/s12859-019-2754-0] [PMID: 30943889]
[3]
Petropoulos S, Edsgärd D, Reinius B, et al. Single-cell RNA-seq reveals lineage and X chromosome dynamics in human preimplantation embryos. Cell 2016; 165(4): 1012-26.
[http://dx.doi.org/10.1016/j.cell.2016.03.023] [PMID: 27062923]
[4]
Nam AS, Chaligne R, Landau DA. Integrating genetic and non-genetic determinants of cancer evolution by single-cell multi-omics. Nat Rev Genet 2021; 22(1): 3-18.
[http://dx.doi.org/10.1038/s41576-020-0265-5] [PMID: 32807900]
[5]
Guo F, Li L, Li J, et al. Single-cell multi-omics sequencing of mouse early embryos and embryonic stem cells. Cell Res 2017; 27(8): 967-88.
[http://dx.doi.org/10.1038/cr.2017.82] [PMID: 28621329]
[6]
Liu B, Gao X, Zhang H. BioSeq-Analysis 2. 0: An updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res 2019; 47(20): e127.
[7]
Zuo Y, Li Y, Chen Y, Li G, Yan Z, Yang L. PseKRAAC: A flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics 2017; 33(1): 122-4.
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583]
[8]
Chen W, Lei T-Y, Jin D-C, Lin H, Chou K-C. PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition. Anal Biochem 2014; 456: 53-60.
[http://dx.doi.org/10.1016/j.ab.2014.04.001] [PMID: 24732113]
[9]
Liang P, Zheng L, Long C, Yang W, Yang L, Zuo Y. HelPredictor models single-cell transcriptome to predict human embryo lineage allocation. Brief Bioinform 2021; 22(6): bbab196.
[http://dx.doi.org/10.1093/bib/bbab196] [PMID: 34037706]
[10]
Do DT, Le NQK. Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features. Genomics 2020; 112(3): 2445-51.
[http://dx.doi.org/10.1016/j.ygeno.2020.01.017] [PMID: 31987913]
[11]
Tian T, Wan J, Song Q, Wei Z. Clustering single-cell RNA-seq data with a model-based deep learning approach. Nat Mach Intell 2019; 1(4): 191-8.
[http://dx.doi.org/10.1038/s42256-019-0037-0]
[12]
Li X, Wang K, Lyu Y, et al. Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nat Commun 2020; 11(1): 2338.
[http://dx.doi.org/10.1038/s41467-020-15851-3] [PMID: 32393754]
[13]
Hu J, Li X, Hu G, Lyu Y, Susztak K, Li M. Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis. Nat Mach Intell 2020; 2(10): 607-18.
[http://dx.doi.org/10.1038/s42256-020-00233-7] [PMID: 33817554]
[14]
Huang G-H, Zhang Y-H, Chen L, Li Y, Huang T, Cai Y-D. Identifying lung cancer cell markers with machine learning methods and single-cell RNA-seq data. Life (Basel) 2021; 11(9): 940.
[http://dx.doi.org/10.3390/life11090940] [PMID: 34575089]
[15]
Le NQK, Hung TNK, Do DT, Lam LHT, Dang LH, Huynh T-T. Radiomics-based machine learning model for efficiently classifying transcriptome subtypes in glioblastoma patients from MRI. Comput Biol Med 2021; 132: 104320.
[http://dx.doi.org/10.1016/j.compbiomed.2021.104320] [PMID: 33735760]
[16]
Hung TNK, Le NQK, Le NH, et al. An AI‐based prediction model for drug‐drug interactions in osteoporosis and Paget’s diseases from SMILES. Mol Inform 2022; e2100264.
[http://dx.doi.org/10.1002/minf.202100264] [PMID: 34989149]
[17]
Wang H, Liang P, Zheng L, Long C, Li H, Zuo Y. eHSCPr discriminating the cell identity involved in endothelial to hematopoietic transition. Bioinformatics 2021; 37(15): 2157-64.
[http://dx.doi.org/10.1093/bioinformatics/btab071] [PMID: 33532815]
[18]
Zappia L, Theis FJ. Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol 2021; 22(1): 301.
[http://dx.doi.org/10.1186/s13059-021-02519-4] [PMID: 34715899]
[19]
Qi R, Wu J, Guo F, Xu L, Zou Q. A spectral clustering with self-weighted multiple kernel learning method for single-cell RNA-seq data. Brief Bioinform 2021; 22(4): bbaa216.
[20]
Wolf FA, Angerer P, Theis FJ. SCANPY: Large-scale single-cell gene expression data analysis. Genome Biol 2018; 19(1): 15.
[http://dx.doi.org/10.1186/s13059-017-1382-0] [PMID: 29409532]
[21]
Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell 2021; 184(13): 3573-3587.e29.
[http://dx.doi.org/10.1016/j.cell.2021.04.048] [PMID: 34062119]
[22]
Chen Y-W, Lin C-J. Combining SVMs with various feature selection strategies.Feature extraction. Springer 2006; pp. 315-24.
[23]
Mishra D, Dash R, Rath AK, Acharya M. Feature selection in gene expression data using principal component analysis and rough set theory. Adv Exp Med Biol. 2011; 696: pp. 91-100.
[24]
Brennecke P, Anders S, Kim JK, et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods 2013; 10(11): 1093-5.
[http://dx.doi.org/10.1038/nmeth.2645] [PMID: 24056876]
[25]
Wei L, Hu J, Li F, Song J, Su R, Zou Q. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Briefings Bioinform 2018; 10.
[26]
Capper D, Jones DTW, Sill M, et al. DNA methylation-based classification of central nervous system tumours. Nature 2018; 555(7697): 469-74.
[http://dx.doi.org/10.1038/nature26000] [PMID: 29539639]
[27]
Albanese D, Filosi M, Visintainer R, Riccadonna S, Jurman G, Furlanello C. Minerva and minepy: A C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics 2013; 29(3): 407-8.
[http://dx.doi.org/10.1093/bioinformatics/bts707] [PMID: 23242262]
[28]
Reshef DN, Reshef YA, Finucane HK, et al. Detecting novel associations in large data sets. Science 2011; 334(6062): 1518-24.
[29]
Urbanowicz RJ, Olson RS, Schmitt P, Meeker M, Moore JH. Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 2018; 85: 168-88.
[http://dx.doi.org/10.1016/j.jbi.2018.07.015] [PMID: 30030120]
[30]
Chen W, Xing P, Zou Q. Detecting N 6-methyladenosine sites from RNA transcriptomes using ensemble support vector machines. Sci Rep 2017; 7(1): 1-8.
[http://dx.doi.org/10.1038/srep40242] [PMID: 28127051]
[31]
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res 2011; 12: 2825-30.
[32]
Lundberg SM, Lee S-I, Eds. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems NIPS’17: Proceedings of the 31st international conference on neural information processing systems. 2017 December; 4768-77.
[33]
Cheng S, Pei Y, He L, Peng G, Reinius B, Tam PP, et al. Single-cell RNA-seq reveals cellular heterogeneity of pluripotency transition and X chromosome dynamics during early mouse development. Cell Rep 2019; 26(10): 2593-607.
[34]
Deng Q, Ramsköld D, Reinius B, Sandberg R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 2014; 343(6167): 193-6.
[http://dx.doi.org/10.1126/science.1245316] [PMID: 24408435]
[35]
Chen L, Pan X, Zeng T, Zhang Y-H, Huang T, Cai Y-D. Identifying essential signature genes and expression rules associated with distinctive development stages of early embryonic cells. IEEE Access 2019; 7: 128570-8.
[http://dx.doi.org/10.1109/ACCESS.2019.2939556]
[36]
Assou S, Boumela I, Haouzi D, et al. Transcriptome analysis during human trophectoderm specification suggests new roles of metabolic and epigenetic genes. PLoS One 2012; 7(6): e39306.
[http://dx.doi.org/10.1371/journal.pone.0039306] [PMID: 22761758]
[37]
Daulhac L, Kowalski-Chauvel A, Pradayrol L, Vaysse N, Seva C. Src-family tyrosine kinases in activation of ERK-1 and p85/p110-phosphatidylinositol 3-kinase by G/CCKB receptors. J Biol Chem 1999; 274(29): 20657-63.
[http://dx.doi.org/10.1074/jbc.274.29.20657] [PMID: 10400698]
[38]
Io S, Kabata M, Iemura Y, Semi K, Morone N, Minagawa A, et al. Capturing human trophoblast development with naive pluripotent stem cells in vitro. Cell Stem Cell 2021; 28(6): 1023-39.
[http://dx.doi.org/10.1016/j.stem.2021.03.013]
[39]
Masoumi Z, Maes GE, Herten K, et al. Preeclampsia is associated with sex-specific transcriptional and proteomic changes in fetal erythroid cells. Int J Mol Sci 2019; 20(8): 2038.
[http://dx.doi.org/10.3390/ijms20082038] [PMID: 31027199]
[40]
Ribeiro MT, Singh S, Guestrin C, Eds. Why should i trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York NY, USA: ACM 2016.
[http://dx.doi.org/10.1145/2939672.2939778]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy