Abstract
Background: The medical data, in the form of prescriptions and test reports, is very extensive which needs a comprehensive analysis.
Objective: The gene expression data set is formulated using a very large number of genes associated to thousands of samples. Identifying the relevant biological information from these complex associations is a difficult task.
Methods: For this purpose, a variety of classification algorithms are available which can be used to automatically detect the desired information. K-Nearest Neighbour Algorithm, Latent Dirichlet Allocation, Gaussian Naïve Bayes and support Vector Classifier are some of the well known algorithms used for the classification task. Nonnegative Matrix Factorization is a technique which has gained a lot of popularity because of its nonnegativity constraints. This technique can be used for better interpretability of data.
Result: In this paper, we applied NMF as a pre-processing step for better results. We also evaluated the given classifiers on the basis of four criteria: accuracy, precision, specificity and Recall.
Conclusion: The experimental results shows that these classifiers give better performance when NMF is applied at pre-processing of data before giving it to the said classifiers. Gaussian Naïve Bias algorithm showed a significant improvement in classification after the application of NMF at preprocessing.
Keywords: Nonnegative matrix factorization, classification algorithms, data mining, specificity, recall, accuracy, precision.
Graphical Abstract
[http://dx.doi.org/10.1186/1471-2105-14-107] [PMID: 23522344]
[http://dx.doi.org/10.1186/1471-2105-12-S13-S8] [PMID: 22373334]
[http://dx.doi.org/10.1093/bioinformatics/btp009] [PMID: 19131367]
[http://dx.doi.org/10.1186/s12859-016-0952-6] [PMID: 26956556]
[http://dx.doi.org/10.1007/s40120-017-0069-5] [PMID: 28733959]
[http://dx.doi.org/10.1007/3-540-27373-5_3]
[http://dx.doi.org/10.1109/ICMLC.2010.45]
[http://dx.doi.org/10.1109/NSSMIC.2009.5401767]
[http://dx.doi.org/10.1093/bioinformatics/18.4.546] [PMID: 12016052]
[http://dx.doi.org/10.1002/env.3170050203]
[http://dx.doi.org/10.1038/44565] [PMID: 10548103]
[http://dx.doi.org/10.1186/1751-0473-8-10] [PMID: 23591137]
[http://dx.doi.org/10.1109/TASL.2010.2040784]
[http://dx.doi.org/10.1371/journal.pone.0164880] [PMID: 27741311]
[http://dx.doi.org/10.5121/ijdms.2011.3207] [http://dx.doi.org/10.1109/TITB.2006.872073] [PMID: 16871720]
[http://dx.doi.org/10.1198/016214502753479248]
[http://dx.doi.org/10.1137/070709967]
[http://dx.doi.org/10.1371/journal.pone.0046331] [PMID: 23133590]
[http://dx.doi.org/10.1016/j.patcog.2007.09.010]
[http://dx.doi.org/10.1109/CVPR.2014.31]