Abstract
Expression quantitative trait loci are used as a tool to identify genetic causes of natural variation in gene expression. Only in a few cases the expression of a gene is controlled by a variant on a single genetic marker. There is a plethora of different complexity levels of interaction effects within markers, within genes and between marker and genes. This complexity challenges biostatisticians and bioinformatitians every day and makes findings difficult to appear. As a way to simplify analysis and better control confounders, we tried a new approach for association analysis between genotypes and expression data. We pursued to understand whether discretization of expression data can be useful in genome-transcriptome association analyses. By discretizing the dependent variable, algorithms for learning classifiers from data as well as performing block selection were used to help understanding the relationship between the expression of a gene and genetic markers. We present the results of using this approach to detect new possible causes of expression variation of DRB5, a gene playing an important role within the immune system. Together with expression of gene DRB5 obtained from the classical microarray technology, we have also measured DRB5 expression by using the more recent next-generation sequencing technology. A supplementary website including a link to the software with the method implemented can be found at http: //bios.ugr.es/DRB5.
Keywords: eQTL, gene expression microarray, machine learning, RNA-seq, SNP.
Graphical Abstract