Abstract
Background: Many applications in voice processing have high inherent parallelism. Field programmable gate array (FPGA) has shown very high performance in spite of its low operational frequency by fully extracting the parallelism. Nevertheless, recent CPU and graphic processing unit (GPU) have also an inherent for high performance.
Methods: In fact, it becomes possible to utilize the parallelism using multi-cores, which support improved single instruction multiple data (SIMD) instruction. Recent GPUs support a large number of cores, and have a potential for high performance in many applications. Our goals are at first to compare GPU and FPGA implementation of the linear prediction coding (LPC) algorithm, in order to understand the trade-off between the flexibility but relatively low speed of an FPGA and the high speed and fixed architecture of the GPU. Secondly, we try to apply various levels optimization from overlapping data transfers to fine-tuning operation sequences.
Results: The experimental results highlight the relative strengths and limitations of the two systems.
Conclusion: Our experiments show that, for several samples corresponding to several speeches coding, GPU manages speedups of up to 3x compared to the FPGA and around 35x compared to a sequential execution.
Keywords: Compute unified device architecture (CUDA), FPGA, GPU, linear predictive coding, optimization strategies, shared memory.
Graphical Abstract