Abstract
Background: Part of Speech (POS) Tagging is a process of defining a suitable part of speech for each word in the given context such as defining if a word is a verb, a noun or a particle. POS tagging is an important preprocessing step in many Natural Language Processing (NLP) applications such as question answering, text summarization, and information retrieval.
Objectives: The performance of NLP applications depends on the accuracy of POS taggers since assigning the right tags for the words in a sentence enables the application to work properly after tagging. Many approaches have been proposed for Arabic language, but more investigations are needed to improve the efficiency of Arabic POS taggers.
Methods: In this study, we propose a supervised POS tagging system for the Arabic language using Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) as well as Hidden Markov Model (HMM). The tagging process is considered as an optimization problem and illustrated as a swarm, which consists of a group of particles. Each particle represents a sequence of tags. The PSO algorithm is applied to find the best sequence of tags, which represent the correct tags of the sentence. The genetic operators: crossover and mutation are used to find personal best, global best, and velocity of the PSO algorithm. HMM is used to find fitness of the particles in the swarm.
Results: The performance of the proposed approach is evaluated on the KALIMAT dataset, which consists of 18 million words and a tag set consists of 45 tags, which cover all Arabic POS tags. The proposed tagger achieved an accuracy of 90.5%.
Conclusion: Experimental results revealed that the proposed tagger achieved promising results compared to four existing approaches. Other approaches can identify only three tags: noun, verb and particle. Also, the accuracy for some tags outperformed those achieved by other approaches.
Keywords: Arabic computational linguistics, arabic text analysis, evolutionary computing, genetic algorithm, POS tagging, PSO algorithm, swarm intelligence algorithms.
Graphical Abstract
[http://dx.doi.org/10.1109/IT4OD.2016.7479306]
[http://dx.doi.org/10.18653/v1/W17-1316]
[http://dx.doi.org/10.1007/s41870-020-00480-2]
[http://dx.doi.org/10.1017/S1351324915000480]
[http://dx.doi.org/10.5121/ijnlc.2013.2301]
[http://dx.doi.org/10.14419/ijet.v7i3.27.17671]
[http://dx.doi.org/10.1007/s10772-015-9302-8]
[http://dx.doi.org/10.3844/jcssp.2014.1865.1873]
[http://dx.doi.org/10.1109/ICCTCT.2018.8550884]
[http://dx.doi.org/10.1007/s11721-007-0002-0]
[http://dx.doi.org/10.1016/j.apm.2011.08.010]
[http://dx.doi.org/10.5121/csit.2013.3821]