Abstract
There has been substantial evidence that functional and structural analyses of genes and proteins can help develop new drugs, diagnose medical conditions and find cures for diseases. However, these biological sequence analyses require large-scale computational power due to the exponential growth of genomic information. During the past two decades considerable efforts have been expended in trying to accelerate the biological sequence database search process which is the fundamental step for further analyses. Various software approaches including SIMD (Single Instruction Multiple Data) instruction, multithreading, message passing programming paradigm and I/O optimization have been employed to speed up the process on different computing platforms at different levels. Hardware techniques such as FPGA (Field Programmable Gate Arrays), GPU (Graphics Processing Unit), IBM CELL BE, DSP (Digital Signal Processors) and ASIC (Applications Specific Integrated Circuit) have also been widely used. This paper reviews relevant computing platforms, various software and hardware approaches as well as the performances they achieved in high throughput sequence database search. It demonstrates that parallelism can be exploited at different phases, granularity levels, types, software/ hardware levels and scopes. This would help researchers understand current development strategies and possible future trends such as aggregate heterogeneous systems in high performance biological sequence analysis.
Keywords: Smith-waterman algorithm, SIMD, FPGA, GPU, DSP, heterogeneous computing, biological sequence analysis