Abstract
Background: Inferring feature importance is both a promise and challenge in bioinformatics and computational biology. While multiple biological computation methods exist to identify decisive factors of single cell subpopulation, there is a need for a comprehensive toolkit that presents an intuitive and custom view of the feature importance.
Objective: We developed a Feature-scML, a scalable and friendly toolkit that allows the users to visualize and reveal decisive factors for single-cell omics analysis.
Methods: Feature-scML incorporates the following three main functions: (i) There are seven feature selection algorithms to comprehensively score and rank every feature. (ii) Four machine learning approaches and increment feature selection (IFS) strategy jointly determine the number of selected features. (iii) The Feature-scML supports the visualized feature importance, model performance evaluation, and model interpretation. The source code is available at https://github.com/liameihao/Feature-scML.
Results: We systematically compared the performance of seven feature selection algorithms from Feature- scML on two single cell transcriptome datasets. It demonstrates the effectiveness and power of the Feature-scML.
Conclusion: Feature-scML is effective for analyzing single-cell RNA omics datasets to automate the machine learning process and customize the visual analysis from the results.
Keywords: Feature ranking, bioinformatics, machine learning, python, feature selection, visualization
Graphical Abstract
[http://dx.doi.org/10.2174/1574893615999200503030350]
[http://dx.doi.org/10.1186/s12859-019-2754-0] [PMID: 30943889]
[http://dx.doi.org/10.1016/j.cell.2016.03.023] [PMID: 27062923]
[http://dx.doi.org/10.1038/s41576-020-0265-5] [PMID: 32807900]
[http://dx.doi.org/10.1038/cr.2017.82] [PMID: 28621329]
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583]
[http://dx.doi.org/10.1016/j.ab.2014.04.001] [PMID: 24732113]
[http://dx.doi.org/10.1093/bib/bbab196] [PMID: 34037706]
[http://dx.doi.org/10.1016/j.ygeno.2020.01.017] [PMID: 31987913]
[http://dx.doi.org/10.1038/s42256-019-0037-0]
[http://dx.doi.org/10.1038/s41467-020-15851-3] [PMID: 32393754]
[http://dx.doi.org/10.1038/s42256-020-00233-7] [PMID: 33817554]
[http://dx.doi.org/10.3390/life11090940] [PMID: 34575089]
[http://dx.doi.org/10.1016/j.compbiomed.2021.104320] [PMID: 33735760]
[http://dx.doi.org/10.1002/minf.202100264] [PMID: 34989149]
[http://dx.doi.org/10.1093/bioinformatics/btab071] [PMID: 33532815]
[http://dx.doi.org/10.1186/s13059-021-02519-4] [PMID: 34715899]
[http://dx.doi.org/10.1186/s13059-017-1382-0] [PMID: 29409532]
[http://dx.doi.org/10.1016/j.cell.2021.04.048] [PMID: 34062119]
[http://dx.doi.org/10.1038/nmeth.2645] [PMID: 24056876]
[http://dx.doi.org/10.1038/nature26000] [PMID: 29539639]
[http://dx.doi.org/10.1093/bioinformatics/bts707] [PMID: 23242262]
[http://dx.doi.org/10.1016/j.jbi.2018.07.015] [PMID: 30030120]
[http://dx.doi.org/10.1038/srep40242] [PMID: 28127051]
[http://dx.doi.org/10.1126/science.1245316] [PMID: 24408435]
[http://dx.doi.org/10.1109/ACCESS.2019.2939556]
[http://dx.doi.org/10.1371/journal.pone.0039306] [PMID: 22761758]
[http://dx.doi.org/10.1074/jbc.274.29.20657] [PMID: 10400698]
[http://dx.doi.org/10.1016/j.stem.2021.03.013]
[http://dx.doi.org/10.3390/ijms20082038] [PMID: 31027199]
[http://dx.doi.org/10.1145/2939672.2939778]