Abstract
Due to the morphology of Uyghur, it poses a challenge to state-of-the-art speech recognition systems. This paper describes our work on Uyghur speech recognition technology and a Uyghur voice search application with recent patents on them. Firstly, we introduce the morphology of Uyghur and Uyghur speech phenomena. Secondly, we investigate the use of morphemes in Uyghur auto speech recognition. When speech phenomena happen, variant surface forms of the morpheme are produced. Then, we describe a new approach utilizing the morphological rules to model speech phenomena. In this new approach, variant surface forms are replaced by their corresponding original stems. This creates a better pronunciation lexicon and a more robust language model in Uyghur speech recognition experiments. Finally, we apply the new morphemebased speech recognition approach to a Uyghur voice search application. Experimental results show that this new approach gives the best results in both speech recognition experiments and voice search experiments.
Keywords: Morpheme, machine learning, pronunciation variation, uyghur, voice search application.
Graphical Abstract