Abstract
Background: Single-cell RNA sequencing is an advanced technology that makes it possible to unravel cellular heterogeneity and conduct single-cell analysis of gene expression. However, owing to technical defects, many dropout events occur during sequencing, bringing about adverse effects on downstream analysis.
Methods: To solve the dropout events existing in single-cell RNA sequencing, we propose an imputation method scTSSR-D, which recovers gene expression by two-side self-representation and dropout information. scTSSR-D is the first global method that combines a partial imputation method to impute dropout values. In other words, we make full use of genes, cells, and dropout information when recovering the gene expression.
Results: The results show scTSSR-D outperforms other existing methods in the following experiments: capturing the Gini coefficient and gene-to-gene correlations observed in single-molecule RNA fluorescence in situ hybridization, down-sampling experiments, differential expression analysis, and the accuracy of cell clustering.
Conclusion: scTSSR-D is a more stable and reliable method to recover gene expression. Meanwhile, our method improves even more dramatically on large datasets compared to the result of existing methods.
Graphical Abstract
[http://dx.doi.org/10.3389/fonc.2020.00447]
[http://dx.doi.org/10.1186/s13073-019-0703-1] [PMID: 31937368]
[http://dx.doi.org/10.1038/nmeth.1315] [PMID: 19349980]
[http://dx.doi.org/10.1186/s13059-015-0805-z] [PMID: 26527291]
[http://dx.doi.org/10.1038/nrg3833] [PMID: 25628217]
[http://dx.doi.org/10.1038/ni.3368] [PMID: 26878113]
[http://dx.doi.org/10.1038/nn.4366] [PMID: 27571192]
[http://dx.doi.org/10.1126/science.aah4573] [PMID: 28428369]
[http://dx.doi.org/10.3389/fgene.2019.00317] [PMID: 31024627]
[http://dx.doi.org/10.1038/s41576-018-0088-9] [PMID: 30617341]
[http://dx.doi.org/10.1007/s11427-021-2119-5] [PMID: 35943690]
[http://dx.doi.org/10.3390/biomedicines9091144]
[http://dx.doi.org/10.1007/s13238-021-00868-1] [PMID: 34405376]
[http://dx.doi.org/10.1007/s10142-022-00883-3]
[http://dx.doi.org/10.1038/s41592-018-0033-z] [PMID: 29941873]
[http://dx.doi.org/10.1186/s13059-018-1575-1] [PMID: 30419955]
[http://dx.doi.org/10.1038/s41467-018-03405-7] [PMID: 29520097]
[http://dx.doi.org/10.1101/397588]
[http://dx.doi.org/10.1016/j.cell.2018.05.061] [PMID: 29961576]
[http://dx.doi.org/10.1093/bioinformatics/btaa148] [PMID: 32108864]
[http://dx.doi.org/10.1038/s41467-021-22197-x] [PMID: 33767197]
[http://dx.doi.org/10.1093/bioinformatics/btaa108] [PMID: 32073600]
[http://dx.doi.org/10.1093/bioinformatics/btaa283] [PMID: 32365169]
[http://dx.doi.org/10.1109/TPAMI.2013.57] [PMID: 24051734]
[http://dx.doi.org/10.1186/s13059-018-1406-4] [PMID: 29478411]
[http://dx.doi.org/10.1084/jem.20170976] [PMID: 28878000]
[http://dx.doi.org/10.1016/j.cels.2016.08.011] [PMID: 27667365]
[http://dx.doi.org/10.1016/j.cell.2016.09.027] [PMID: 27716510]
[http://dx.doi.org/10.1016/j.celrep.2017.03.004] [PMID: 28355573]
[http://dx.doi.org/10.1126/science.aaa1934] [PMID: 25700174]
[http://dx.doi.org/10.1038/nbt.4096] [PMID: 29608179]
[http://dx.doi.org/10.1186/s12859-018-2226-y] [PMID: 29884114]
[http://dx.doi.org/10.1038/nature22794]
[http://dx.doi.org/10.1016/j.cels.2018.01.014] [PMID: 29454938]
[http://dx.doi.org/10.1186/s13059-016-1010-4] [PMID: 27368803]
[http://dx.doi.org/10.3389/fgene.2021.733906] [PMID: 34512734]
[http://dx.doi.org/10.1016/j.gpb.2018.08.003] [PMID: 30576740]
[http://dx.doi.org/10.1093/bioinformatics/btx490] [PMID: 29036318]