Abstract
Background: In many important crops genomic studies are generating large amounts of data from cDNA sequencing and RNA expression experiments. Genomic data is complementing the efforts at improving production of new plant varieties with resistance to major worldwide biotic problems, facing the climate change challenge and pursuing the quest for better quality. After the initial exploratory phase of genome sequencing and functional characterization of genes of interest, a postgenomics phase is pointing towards the understanding of the organism function as a whole, through Systems Biology.
Objective: To develop a Software Architecture that facilitates Gene Networks inference from highthroughput gene expression data collected from microarray experiments.
Method: A pipeline architecture was designed and constructed for data mining that was validated using known pathways for starch and sucrose metabolism in plants.
Results: The pipeline provides the support for functional annotations of both putative homologs and new genes, allowing as well the identification of novel co-expressed gene clusters related to metabolic important traits.
Conclusion: Our approach can be transferred between organisms, taking advantage of the open and adaptable platform in R language, and visualization of gene expression networks that can be easily incorporated for web access.
Keywords: Network inference, naïve bayes classificator, paralog detection, systems biology, microarray, gene expression networks.
Graphical Abstract