Abstract
Background: Single-cell RNA sequencing techniques have emerged as effective approaches for finding the heterogeneity between cells and discovering the differentiation stage. Adaptive total variation graph regularized nonnegative matrix factorization (ATV-NMF) has been proposed to capture the inner geometric structure and determine whether to retain feature details or denoise, which is suitable for analyzing single-cell data. However, the rank of matrix factorization significantly affects clustering performance greatly, and it is still challenging to determine the optimal rank.
Objective: To solve the problem, in this paper, we propose an ensemble clustering method ANMFCE to integrate several base clustering results corresponding to different parameter rank values.
Methods: Firstly, we use the ATV-NMF algorithm to obtain clustering results with different dimension reduction ranks. Secondly, the consensus function based on connected-triple-based similarity is applied to obtain the similarity matrix. Finally, the spectral clustering method is used to find the final optimal partition.
Results: Clustering results on six single-cell sequencing datasets show that our method is more advanced than the individual ATV-NMF method and other comparison methods, which can illustrate that our method is effective in finding the heterogeneity in single-cell datasets. Moreover, the identification of gene markers also achieves accurate results.
Conclusion: In summary, our method is effective for analyzing single-cell RNA sequencing datasets.
Keywords: Ensemble clustering, dimension reduction, adaptive total variation, graph regularization, nonnegative matrix factorization, single-cell RNA sequencing.
Graphical Abstract