Generic placeholder image

Recent Advances in Computer Science and Communications

Editor-in-Chief

ISSN (Print): 2666-2558
ISSN (Online): 2666-2566

Research Article

Matrix Factorization-based Improved Classification of Gene Expression Data

Author(s): Shaily Malik* and Poonam Bansal

Volume 13, Issue 5, 2020

Page: [858 - 863] Pages: 6

DOI: 10.2174/2213275912666190715165034

Price: $65

Abstract

Background: The medical data, in the form of prescriptions and test reports, is very extensive which needs a comprehensive analysis.

Objective: The gene expression data set is formulated using a very large number of genes associated to thousands of samples. Identifying the relevant biological information from these complex associations is a difficult task.

Methods: For this purpose, a variety of classification algorithms are available which can be used to automatically detect the desired information. K-Nearest Neighbour Algorithm, Latent Dirichlet Allocation, Gaussian Naïve Bayes and support Vector Classifier are some of the well known algorithms used for the classification task. Nonnegative Matrix Factorization is a technique which has gained a lot of popularity because of its nonnegativity constraints. This technique can be used for better interpretability of data.

Result: In this paper, we applied NMF as a pre-processing step for better results. We also evaluated the given classifiers on the basis of four criteria: accuracy, precision, specificity and Recall.

Conclusion: The experimental results shows that these classifiers give better performance when NMF is applied at pre-processing of data before giving it to the said classifiers. Gaussian Naïve Bias algorithm showed a significant improvement in classification after the application of NMF at preprocessing.

Keywords: Nonnegative matrix factorization, classification algorithms, data mining, specificity, recall, accuracy, precision.

Graphical Abstract

[1]
J.J.Y. Wang, X. Wang, and X. Gao, "Non-negative matrix factori- zation by maximizing correntropy for cancer clustering, BMC Bioinformat., vol. 14, no. 1, p. 107, 2013",
[http://dx.doi.org/10.1186/1471-2105-14-107] [PMID: 23522344]
[2]
K. Bryan, P. Cunningham, and N. Bolshakova, "Application of simulated annealing to the biclustering of gene expression data", IEEE Trans. Inf. Technol. Biomed., vol. 10, no. 3, pp. 519-525, 2006.
[3]
M.H. Kim, H.J. Seo, J.G. Joung, and J.H. Kim, "Comprehensive evaluation of matrix factorization methods for the analysis of DNA microarray gene expression data, BMC Bioinformat., vol. 12, no. 13, suppl. Suppl. 13, p. S8, 2011. [BioMed Central]",
[http://dx.doi.org/10.1186/1471-2105-12-S13-S8] [PMID: 22373334]
[4]
S.A. Hassan, and T. Khan, "A machine learning model to predict the onset of alzheimer disease using potential cerebrospinal fluid (CSF) biomarkers", Int. J. Advanced Comp. Sci. Appl., vol. 8, no. 12, pp. 124-131, 2017.
[5]
Q. Qi, Y. Zhao, M. Li, and R. Simon, "Non-negative matrix factor- ization of gene expression profiles: A plug-in for BRB-ArrayTools", Bioinformatics, vol. 25, no. 4, pp. 545-547, 2009.
[http://dx.doi.org/10.1093/bioinformatics/btp009] [PMID: 19131367]
[6]
S. Bandyopadhyay, and S. Saha, Unsupervised classification: simi- larity measures, classical and metaheuristic approaches, and ap- plications., Springer Science & Business Media: Berlin, Heidelberg, 2012.
[7]
Y. Li, and A. Ngom, "The non-negative matrix factorization toolbox for biological data mining", Source Code Biol. Med., vol. 8, no. 1, p. 10, 2013.
[8]
S. Ray, and S. Bandyopadhyay, "A NMF based approach for inte- grating multiple data sources to predict HIV-1-human PPIs", BMC Bioinformatics, vol. 17, no. 1, p. 121, 2016.
[http://dx.doi.org/10.1186/s12859-016-0952-6] [PMID: 26956556]
[9]
M.N. Sabbagh, L.F. Lue, D. Fayard, and J. Shi, "Increasing preci- sion of clinical diagnosis of Alzheimer’s disease using a combined algorithm incorporating clinical and novel biomarker data, Neurol. Ther., vol. 6, no. 1, suppl. Suppl. 1, pp. 83-95, 2017",
[http://dx.doi.org/10.1007/s40120-017-0069-5] [PMID: 28733959]
[10]
D.G. Calò, G. Galimberti, M. Pillati, and C. Viroli, Variable selec- tion in cell classification problems: A strategy based on independent component analysis.New Developments in Classification and Data Analysis., Springer: Berlin, Heidelberg, 2005, pp. 21-29.
[http://dx.doi.org/10.1007/3-540-27373-5_3]
[11]
S. Joshi, D. Shenoy, P.L. Rrashmi, K.R. Venugopal, and L.M. Patnaik, "Classification of Alzheimer’s disease and Parkinson’s disease by using machine learning and neural network methods", 2010 Second International Conference on Machine Learning and Computing, 2010pp. 218-222
[http://dx.doi.org/10.1109/ICMLC.2010.45]
[12]
F. Segovia, J.M. Górriz, J. Ramírez, D. Salas-González, I.A. Illán, M. López, and I.R. Keck, "fMRI data analysis using a novel clus- tering technique", 2009 IEEE Nuclear Science Symposium Confer- ence Record (NSS/MIC), 2009pp. 3399-3403
[http://dx.doi.org/10.1109/NSSMIC.2009.5401767]
[13]
W. Pan, "A comparative review of statistical methods for discover- ing differentially expressed genes in replicated microarray experi- ments", Bioinformatics, vol. 18, no. 4, pp. 546-554, 2002.
[http://dx.doi.org/10.1093/bioinformatics/18.4.546] [PMID: 12016052]
[14]
P. Paatero, and U. Tapper, "Positive matrix factorization: A non‐ negative factor model with optimal utilization of error estimates of data values", Environmetrics, vol. 5, no. 2, pp. 111-126, 1994.
[http://dx.doi.org/10.1002/env.3170050203]
[15]
D.D. Lee, and H.S. Seung, "Learning the parts of objects by non- negative matrix factorization", Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[http://dx.doi.org/10.1038/44565] [PMID: 10548103]
[16]
D.D. Lee, and H.S. Seung, "Algorithms for non-negative matrix factorization. Proceedings of Advances in neural information processing systems. 2001, pp. 556-562",
[http://dx.doi.org/10.1186/1751-0473-8-10] [PMID: 23591137]
[17]
E. Benetos, and C. Kotropoulos, "Non-negative tensor factorization applied to music genre classification", IEEE Trans. Audio Speech Lang. Process., vol. 18, no. 8, pp. 1955-1967, 2010.
[http://dx.doi.org/10.1109/TASL.2010.2040784]
[18]
H. Yang, and C. Seoighe, "Impact of the choice of normalization method on molecular cancer class discovery using nonnegative matrix factorization", PLoS One, vol. 11, no. 10, 2016.e0164880
[http://dx.doi.org/10.1371/journal.pone.0164880] [PMID: 27741311]
[19]
B.V. Ramana, M.S.P. Babu, and N.B. Venkateswarlu, "A critical study of selected classification algorithms for liver disease diagno sis", Int. J. Database Management Syst., vol. 3, no. 2, pp. 101-114, 2011.
[http://dx.doi.org/10.5121/ijdms.2011.3207] [http://dx.doi.org/10.1109/TITB.2006.872073] [PMID: 16871720]
[20]
S. Dudoit, J. Fridlyand, and T.P. Speed, "Comparison of discrimi- nation methods for the classification of tumors using gene expres- sion data", J. Am. Stat. Assoc., vol. 97, no. 457, pp. 77-87, 2002.
[http://dx.doi.org/10.1198/016214502753479248]
[21]
S.A. Vavasis, "On the complexity of nonnegative matrix factoriza- tion", SIAM J. Optim., vol. 20, no. 3, pp. 1364-1377, 2009.
[http://dx.doi.org/10.1137/070709967]
[22]
L. Taslaman, and B. Nilsson, "A framework for regularized non- negative matrix factorization, with application to the analysis of gene expression data", PLoS One, vol. 7, no. 11, 2012.e46331
[http://dx.doi.org/10.1371/journal.pone.0046331] [PMID: 23133590]
[23]
C. Boutsidis, and E. Gallopoulos, "SVD based initialization: A head start for nonnegative matrix factorization", Pattern Recognit., vol. 41, no. 4, pp. 1350-1362, 2008.
[http://dx.doi.org/10.1016/j.patcog.2007.09.010]
[24]
M.M. Kalayeh, H. Idrees, and M. Shah, "NMF-KNN: Image anno- tation using weighted multi-view non-negative matrix factoriza-tion", Proceedings of the IEEE conference on computer vision and pattern recognition, 2014pp. 184-191
[http://dx.doi.org/10.1109/CVPR.2014.31]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy