Abstract
Recent progress in experimental techniques such as high-throughput genome sequencing, proteomics, transcriptomics and interactomics have lead to a new demand for integrated computational analyses, capable of systematically organizing these heterogeneous, fragmentary data into a coherent whole. As a consequence, novel systemlevel bioinformatics solutions are now being developed with the goal of understanding and predicting the behaviour of complex systems, such as molecular pathways, cells, tissues, organs and even whole organisms. Multiple alignments of both nucleotide and protein sequences play a central role in many of these applications, which range from the identification of genes and their products, via the characterisation of their 3D structure and their molecular and cellular functions, to the prediction of the phenotypic consequences of mutations, reverse engineering and drug design. In a multiple sequence alignment, structural and functional data can be combined with evolutionary information to allow reliable data validation, consensus predictions and rational propagation of information from known to unknown sequences. Clearly, integration at this scale calls for high quality, automatic multiple alignments. Alignment techniques are now responding to the challenge, with current developments moving away from a single all-encompassing algorithm towards more co-operative, knowledge based systems. However, the success of these methods relies on the efficient integration of information from different databases and the close cooperation of the different data mining and investigation algorithms. A large community effort is now underway to develop standards for data exchange and organisation that will facilitate collaborations between the various resources, in order to support improved domain understanding and to provide better decision-making systems and services for the biologist.
Keywords: Multiple sequence alignment, multiple alignment quality, systems biology, data integration