A New Approach for Predicting the Value of Gene Expression: Two-way Collaborative Filtering

Tuncay       Bayrak; Hasan       Oğul

doi:10.2174/1574893614666190126144139

Abstract

Background: Predicting the value of gene expression in a given condition is a challenging topic in computational systems biology. Only a limited number of studies in this area have provided solutions to predict the expression in a particular pattern, whether or not it can be done effectively. However, the value of expression for the measurement is usually needed for further meta-data analysis.

Methods: Because the problem is considered as a regression task where a feature representation of the gene under consideration is fed into a trained model to predict a continuous variable that refers to its exact expression level, we introduced a novel feature representation scheme to support work on such a task based on two-way collaborative filtering. At this point, our main argument is that the expressions of other genes in the current condition are as important as the expression of the current gene in other conditions. For regression analysis, linear regression and a recently popularized method, called Relevance Vector Machine (RVM), are used. Pearson and Spearman correlation coefficients and Root Mean Squared Error are used for evaluation. The effects of regression model type, RVM kernel functions, and parameters have been analysed in our study in a gene expression profiling data comprising a set of prostate cancer samples.

Results: According to the findings of this study, in addition to promising results from the experimental studies, integrating data from another disease type, such as colon cancer in our case, can significantly improve the prediction performance of the regression model.

Conclusion: The results also showed that the performed new feature representation approach and RVM regression model are promising for many machine learning problems in microarray and high throughput sequencing analysis.

Keywords: Relevance vector machine, two-way collaborative filtering, microarray, gene expression prediction, regression, feature representation.

« Previous Next »

Graphical Abstract

[1] 
Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science  1995; 270(5235): 467-70.
[2] 
Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics  2001; 17(6): 520-5.
[3] 
Dede D, Oğul H. TriClust: A tool for cross-species analysis of gene regulation. Mol Inform  2014; 33(5): 382-7.
[4] 
Hafez D, Karabacak A, Krueger S, et al. McEnhancer: Predicting gene expression via semi-supervised assignment of enhancers to target genes. Genome Biol  2017; 18(1): 199.
[5] 
Ogul H, Akkaya MS. Data integration in functional analysis of microRNAs. Curr Bioinform  2011; 6: 462-72.
[6] 
Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science  1999; 286(5439): 531-7.
[7] 
Khan J, Wei JS, Ringnér M, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med  2001; 7(6): 673-9.
[8] 
van ’t Veer LJ, Dai H, van de Vijver MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature  2002; 415(6871): 530-6.
[9] 
Lee JS, Chu IS, Heo J, et al. Classification and prediction of survival in hepatocellular carcinoma by gene expression profiling. Hepatology  2004; 40(3): 667-76.
[10] 
Azzawi H, Hou J, Xiang Y, Alanni R. Lung cancer prediction from microarray data by gene expression programming. IET Syst Biol  2016; 10(5): 168-78.
[11] 
Beyan C, Ogul H. A fuzzy kNN aprroach for cancer diagnosis with microarray gene expression data Proceedings of 3rd International Sympoisum on Health. Informatics and Bioinformatics 2008.
[12] 
Beer MA, Tavazoie S. Predicting gene expression from sequence. Cell  2004; 117(2): 185-98.
[13] 
Yuan Y, Guo L, Shen L, Liu JS. Predicting gene expression from sequence: a reexamination. PLOS Comput Biol  2007; 3(11)e243
[14] 
Liew AWC, Law NF, Yan H. Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief Bioinform  2011; 12(5): 498-513.
[15] 
Armina R, Zain AM, Ali NA, Sallehuddin R. A Review On Missing Value Estimation Using Imputation Algorithm 
[16] 
Wu WS, Jhou MJ. MVIAeval: A web tool for comprehensively evaluating the performance of a new missing value imputation algorithm. BMC Bioinformatics  2017; 18(1): 31.
[17] 
De Silva HM, Perera AS. Evolutionary k-nearest neighbor imputation algorithm for gene expression data 2017 10: 1-8.
[18] 
Saha S, Bandopadhyay S, Ghosh A, Dey KN. An improved fuzzy based approach to impute missing values in DNA microarray gene expression data with collaborative filtering. IEEE Xplore  2016; 2016
[http://dx.doi.org/10.1109/ICACCI.2016.7732161] 
[19] 
Shahzad W, Rehman Q, Ahmed E. Missing data imputation using genetic algorithm for supervised learning. Int J Advanced Com Sci App  2017; 8(3): 438-45.
[20] 
Wang A, Chen Y, An N, Yang J, Li L, Jiang L. Microarray missing
value imputation: A regularized local learning method. IEEE/ACM
Trans Comput Biol Bioinform 2019; 16: 980-93.
[21] 
Xie R, Quitadamo A, Cheng J, Shi X. A predictive model of gene expression using a deep learning framework. IEEE International Conference on Bioinformatics and Biomedicine.  2016 Dec 15-18; Shenzhen, China. 676-81.
[22] 
Yu Z, Li T, Horng SJ, Pan Y, Wang H, Jing Y. An iterative locally auto-weighted least squares method for microarray missing value estimation. IEEE Trans Nanobioscience  2017; 16(1): 21-33.
[23] 
Tsai CF, Li ML, Lin WC. A class center based approach for missing value imputation. Knowl Base Syst  2018; 151: 124-35.
[24] 
Ogul H, Tuncer ME. MicroRNA expression prediction: Regression from regulatory elements. Biocybern Biomed Eng  2016; 36(1): 89-94.
[25] 
Bayrak T, Ogul H. Microarray missing data imputation using regression. 13th IASTED International Conference. Vienna, Austria. 2017; pp. 2017; 68-73.
[26] 
Ogul H, Ekmekciler E. Two-way collaborative filtering on semantically enhanced movie ratings Proceedings of the ITI 2012 34th International Conference on Information Technology Interfaces, Cavtat, Croatia, IEEE Xplore, 2012.
[27] 
Gröne J, Lenze D, Jurinovic V, et al. Molecular profiles and clinical outcome of stage UICC II colon cancer patients. Int J Colorectal Dis  2011; 26(7): 847-58.
[28] 
Satake H, Tamura K, Furihata M, et al. The ubiquitin-like molecule interferon-stimulated gene 15 is overexpressed in human prostate cancer. Oncol Rep  2010; 23(1): 11-6.
[29] 
Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res  2013; 4: D991-5.
[30] 
Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc  2009; 4(1): 44-57.
[31] 
Yates A, Akanni W, Amode MR, et al. Ensembl 2016. Nucleic Acids Res  2016; 44(D1): D710-6.
[32] 
Tipping ME. Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res  2001; 1: 211-44.
[33] 
Dong X, Greven MC, Kundaje A, et al. Modeling gene expression using chromatin features in various cellular contexts. Genome Biol  2012; 13(9): R53.
[34] 
Murphy KP. Machine Learning: A Probabilistic Perspective. London: MIT Press 2012.
[35] 
Tipping ME, Faul AC. Fast marginal likelihood maximisation for sparse Bayesian models. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics. 2003 Jan 3-6; Key West, FL. 2003.
[36] 
Moffett HF, Coon ME, Radtke S, et al. Hit-and-run programming of therapeutic cytoreagents using mRNA nanocarriers. Nat Commun  2017; 8(1): 389.
[37] 
Le HS, Bar-Joseph Z. Integrating sequence, expression and interaction data to determine condition-specific miRNA regulation. Bioinformatics  2013; 29(13): i89-97.
[38] 
Sumazin P, Chen Y, Treviño LR, et al. Genomic analysis of hepatoblastoma identifies distinct molecular and prognostic subgroups. Hepatology  2017; 65(1): 104-21.
[39] 
Luo Z, Azencott R, Zhao Y. Modeling miRNA-mRNA interactions: fitting chemical kinetics equations to microarray data. BMC Syst Biol  2014; 8(1): 19.
[40] 
Patra BK, Launonen R, Ollikainen V, Nandi S. A new similarity measure using the Bhattacharyya coefficient for collaborative filtering in sparse data. Knowl-based Syst  2015; 82: 163-77.

Rights & Permissions Print Cite

Article Metrics

50

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666190126144139	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

A New Approach for Predicting the Value of Gene Expression: Two-way Collaborative Filtering

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Related Articles

Abstract