Abstract
Background: Due to the limited amount of mRNA in single-cell, there are always many missing values in scRNA-seq data, making it impossible to accurately quantify the expression of singlecell RNA. The dropout phenomenon makes it impossible to detect the truly expressed genes in some cells, which greatly affects the downstream analysis of scRNA-seq data, such as cell cluster analysis and cell development trajectories.
Objective: This research proposes an accurate deep learning method to impute the missing values in scRNA-seq data. DSAE-Impute employs stacked autoencoders to capture gene expression characteristics in the original missing data and combines the discriminative correlation matrix between cells to capture global expression features during the training process to accurately predict missing values.
Methods: We propose a novel deep learning model based on the discriminative stacked autoencoders to impute the missing values in scRNA-seq data, named DSAE-Impute. DSAE-Impute embeds the discriminative cell similarity to perfect the feature representation of stacked autoencoders and comprehensively learns the scRNA-seq data expression pattern through layer-by-layer training to achieve accurate imputation.
Results: We have systematically evaluated the performance of DSAE-Impute in the simulation and real datasets. The experimental results demonstrate that DSAE-Impute significantly improves downstream analysis, and its imputation results are more accurate than other state-of-the-art imputation methods.
Conclusion: Extensive experiments show that compared with other state-of-the-art methods, the imputation results of DSAE-Impute on simulated and real datasets are more accurate and helpful for downstream analysis.
Keywords: scRNA-seq, imputation, gene expression, dropout event, a discriminative stacked autoencoder, DSAE-Impute.
Graphical Abstract
[http://dx.doi.org/10.1016/j.molcel.2015.04.005] [PMID: 26000846]
[http://dx.doi.org/10.1038/s41569-020-0359-y] [PMID: 32231331]
[http://dx.doi.org/10.1038/s41581-018-0021-7] [PMID: 29789704]
[http://dx.doi.org/10.2174/1566523221666210319104752] [PMID: 33745433]
[http://dx.doi.org/10.1093/bib/bbab105] [PMID: 33822873]
[http://dx.doi.org/10.1126/science.1247651] [PMID: 24531970]
[http://dx.doi.org/10.1038/nature12172] [PMID: 23685454]
[http://dx.doi.org/10.1126/science.aah4573] [PMID: 28428369]
[http://dx.doi.org/10.1038/nrg2484] [PMID: 19015660]
[http://dx.doi.org/10.1038/nmeth.4292] [PMID: 28504683]
[http://dx.doi.org/10.7717/peerj.2888] [PMID: 28133571]
[http://dx.doi.org/10.1038/nbt.2967] [PMID: 25086649]
[http://dx.doi.org/10.1016/j.jestch.2020.10.005]
[http://dx.doi.org/10.3390/s20092625]
[http://dx.doi.org/10.1093/nar/gku555] [PMID: 25053837]
[http://dx.doi.org/10.1093/bib/bbaa216]
[http://dx.doi.org/10.1093/bib/bbz062] [PMID: 31271412]
[http://dx.doi.org/10.1016/j.cell.2018.05.061] [PMID: 29961576]
[http://dx.doi.org/10.1186/s12859-018-2226-y] [PMID: 29884114]
[http://dx.doi.org/10.1101/217737]
[http://dx.doi.org/10.12688/f1000research.13511.3] [PMID: 29511531]
[PMID: 31584606]
[http://dx.doi.org/10.1038/s41592-018-0033-z] [PMID: 29941873]
[http://dx.doi.org/10.1101/665323]
[http://dx.doi.org/10.1186/s12859-018-2379-8] [PMID: 30453924]
[http://dx.doi.org/10.1186/s13059-018-1575-1] [PMID: 30419955]
[http://dx.doi.org/10.1101/772723]
[http://dx.doi.org/10.1038/nmeth.2772] [PMID: 24363023]
[http://dx.doi.org/10.1101/397588]
[http://dx.doi.org/10.3389/fgene.2019.00009] [PMID: 30761179]
[http://dx.doi.org/10.1101/379883]
[http://dx.doi.org/10.1093/bioinformatics/btaa109] [PMID: 32073612]
[http://dx.doi.org/10.1093/bioinformatics/btaa108] [PMID: 32073600]
[http://dx.doi.org/10.1093/bioinformatics/btaa139] [PMID: 32119079]
[PMID: 31392316]
[http://dx.doi.org/10.1101/gr.251603.119] [PMID: 31992614]
[http://dx.doi.org/10.1038/s41598-018-34688-x] [PMID: 30397240]
[http://dx.doi.org/10.1038/s41467-018-07931-2] [PMID: 30674886]
[http://dx.doi.org/10.3390/genes11050532] [PMID: 32403260]
[http://dx.doi.org/10.1038/s41592-018-0229-2] [PMID: 30504886]
[http://dx.doi.org/10.1186/s13059-020-02083-3] [PMID: 32650816]
[http://dx.doi.org/10.1101/2020.02.05.935296]
[http://dx.doi.org/10.1186/s13059-019-1837-6] [PMID: 31627739]
[http://dx.doi.org/10.1038/s41592-019-0353-7] [PMID: 30886411]
[http://dx.doi.org/10.1038/s41592-019-0576-7] [PMID: 31591579]
[http://dx.doi.org/10.1038/s41467-017-02554-5] [PMID: 29348443]
[http://dx.doi.org/10.1109/ICIP.2018.8451462]
[http://dx.doi.org/10.1038/ncomms14049] [PMID: 28091601]
[http://dx.doi.org/10.1038/s41593-017-0029-5] [PMID: 29230054]
[http://dx.doi.org/10.1186/s13059-017-1305-0] [PMID: 28899397]
[http://dx.doi.org/10.1093/bib/bbaa314]
[http://dx.doi.org/10.1038/s41587-021-00870-2] [PMID: 33795888]
[http://dx.doi.org/10.3389/fgene.2019.00317] [PMID: 31024627]
[http://dx.doi.org/10.1038/s41598-019-41695-z] [PMID: 30914743]
[http://dx.doi.org/10.1007/BF01908075]
[http://dx.doi.org/10.1016/0377-0427(87)90125-7]
[http://dx.doi.org/10.1038/s41467-018-07170-5] [PMID: 30459309]
[http://dx.doi.org/10.1080/01621459.1983.10478008]