Abstract
Proteomic analysis does inherently involve the handling and interpretation of a huge amount of data. Originally, proteome analysis aimed at collecting comprehensive information on all proteins that are present in a specified sample. Proteome inventories developed to be highly sophisticated databases collecting all different kinds of data formats, including two-dimensional gel images, mass spectra, protein sequences, and post-translational modifications. These databases lay the foundations for identifying proteins in proteomic studies aiming to find differentially expressed proteins, thus promoting proteomics from constructing "descriptive" databases to the design of "functional" experiments. With the quest for finding differentially expressed proteins, data have to be compared between two or more experimental groups. Therefore, a new field of bioinformatic tools had to be developed for the proteomic high-throughput technologies, or these tools had to be adapted from other applications, such as genomic and transcriptomic analysis based on nucleotide microarrays. In this review, the strategies for comparing protein concentration and functional activity by different statistical means, as well as the methodology of comparing whole sets of genes or proteins have been evaluated. Comments on the statistical algorithms incorporated in 2D gel analysis software, and discussions on alternatives for data comparison have been incorporated. The use of supervised and unsupervised data analysis and its application in proteomic experiments, including the use of hierarchical clustering for identification of functional pathways in proteome analysis have also been reviewed.
Keywords: Two-dimensional gel electrophoresis, proteomics, statistical analysis, microarray, hierarchical clustering, unsupervised data analysis, bioinformatics