Generic placeholder image

Recent Advances in Computer Science and Communications

Editor-in-Chief

ISSN (Print): 2666-2558
ISSN (Online): 2666-2566

Research Article

Part-of-Speech Tagging for Arabic Text using Particle Swarm Optimization and Genetic Algorithm

Author(s): Ahmad T. Al-Taani* and Fadi A. ALkhazaaleh

Volume 15, Issue 5, 2022

Published on: 14 January, 2021

Article ID: e060422190303 Pages: 7

DOI: 10.2174/2666255814666210114120558

Price: $65

Abstract

Background: Part of Speech (POS) Tagging is a process of defining a suitable part of speech for each word in the given context such as defining if a word is a verb, a noun or a particle. POS tagging is an important preprocessing step in many Natural Language Processing (NLP) applications such as question answering, text summarization, and information retrieval.

Objectives: The performance of NLP applications depends on the accuracy of POS taggers since assigning the right tags for the words in a sentence enables the application to work properly after tagging. Many approaches have been proposed for Arabic language, but more investigations are needed to improve the efficiency of Arabic POS taggers.

Methods: In this study, we propose a supervised POS tagging system for the Arabic language using Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) as well as Hidden Markov Model (HMM). The tagging process is considered as an optimization problem and illustrated as a swarm, which consists of a group of particles. Each particle represents a sequence of tags. The PSO algorithm is applied to find the best sequence of tags, which represent the correct tags of the sentence. The genetic operators: crossover and mutation are used to find personal best, global best, and velocity of the PSO algorithm. HMM is used to find fitness of the particles in the swarm.

Results: The performance of the proposed approach is evaluated on the KALIMAT dataset, which consists of 18 million words and a tag set consists of 45 tags, which cover all Arabic POS tags. The proposed tagger achieved an accuracy of 90.5%.

Conclusion: Experimental results revealed that the proposed tagger achieved promising results compared to four existing approaches. Other approaches can identify only three tags: noun, verb and particle. Also, the accuracy for some tags outperformed those achieved by other approaches.

Keywords: Arabic computational linguistics, arabic text analysis, evolutionary computing, genetic algorithm, POS tagging, PSO algorithm, swarm intelligence algorithms.

Graphical Abstract

[1]
I. Zeroual, and L. Abdelhak, "Adapting a decision tree based tagger for Arabic", In: 2016 International Conference on Information Technology for Organizations Development (IT4OD), Fez, Morocco, , 2016, pp. 1-6.
[http://dx.doi.org/10.1109/IT4OD.2016.7479306]
[2]
K. Darwish, H. Mubarak, A. Abdelali, and M. Eldesouki, "Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet", In: Third Arabic Natural Language Processing Workshop, 2017, pp. 130-137. Valencia, Spain
[http://dx.doi.org/10.18653/v1/W17-1316]
[3]
S. Mohammed, "Using machine learning to build POS tagger for under-resourced language: the case of Somali", International Journal of Information Technology, vol. 12, pp. 717-729, 2020.
[http://dx.doi.org/10.1007/s41870-020-00480-2]
[4]
S. Heid, M. Wever, and E. Hüllermeier, "Reliable Part-of-Speech ", Tagging of Historical Corpora through Set-Valued Prediction. arXiv:2008.01377. 2020
[5]
S.M. Gaber, A. Zakree, N. Omar Nazri, and S. Abdullah, "Part-of-Speech (POS) Tagger for Malay Language using Naïve Bayes and K-Nearest Neighbor Model", Int. J. Psychosoc. Rehabil., vol. 24, no. 6, pp. 5468-5476, 2020.
[6]
A. Alhasan, and A.T. Al-Taani, "POS Tagging for Arabic Text Using Bee Colony Algorithm", Procedia Computer Science Elsevier B.V., vol. 142, pp. 158-165, 2018.
[7]
Z. Ben Othmane, C. Ben Fraj, and I. Limam, "POS-tagging Arabic texts: A novel approach based on ant colony", Nat. Lang. Eng., vol. 23, no. 3, pp. 419-439, 2016.
[http://dx.doi.org/10.1017/S1351324915000480]
[8]
B. Ben Ali, and F. Jarray, "Generic approach for Arabic part of speech tagging", International Journal on Natural Language Computing, vol. 2, no. 3, pp. 4-11, 2013. [IJNLC].
[http://dx.doi.org/10.5121/ijnlc.2013.2301]
[9]
A. Al-Taani, and S. Al-Rub, "A rule-based approach for tagging non-vocalized Arabic words", Int. Arab J. Inf. Technol., vol. 6, no. 3, pp. 320-328, 2009.
[10]
A. Boudlal, and A. Lakhouaja, Alkhalil morpho sys1: A morphosyntactic analysis system for Arabic texts Faculty of Sciences, Department of Mathematics and Computer Sciences, University Mohammed I, Oujda, 2010.
[11]
M. Boudchiche, A. Mazroui, M.O. Bebah, A. Lakhouaja, and A. Boudlal, "“AlKhalil Morpho Sys2: A robust Arabic morpho-syntactic analyzer”, Journal of King Saud University - Computer and Information Sciences, vol. 29", King Saud University, no. 2, pp. 141-146, 2017.
[12]
M. Albared, "Arabic part of speech disambiguation: a supervised stochastic morpheme-based approach, PhD Thesis, ", Universiti Kebangsaan Malaysia, Malaysia,, 2011.
[13]
A.H. Aliwy, IACSIT Int. J. Eng. Technol., vol. 7, no. 3, pp. 125-128, 2018.
[http://dx.doi.org/10.14419/ijet.v7i3.27.17671]
[14]
N. Ababou, and A. Mazroui, "A hybrid Arabic POS tagging for simple and compound morphosyntactic tags", Int. J. Speech Technol., vol. 19, pp. 289-302, 2016.
[http://dx.doi.org/10.1007/s10772-015-9302-8]
[15]
R. Mahafdah, N. Omar, and O. Al-Omari, "Arabic Part of Speech Tagging using K-Nearest Neighbour and Naive Bayes Classifiers Combination", J. Comput. Sci., vol. 9, pp. 1865-1873, 2014.
[http://dx.doi.org/10.3844/jcssp.2014.1865.1873]
[16]
S.G. Kanakaraddi, and S.S. Nandyal, "Survey on Parts of Speech Tagger Techniques", International Conference on Current Trends towards Converging Technologies, 2018. Coimbatore, India
[http://dx.doi.org/10.1109/ICCTCT.2018.8550884]
[17]
R.Z. Al-Abdallah, and A.T. Al-Taani, "Arabic Single-Document Text Summarization Using Particle Swarm Optimization Algorithm", Procedia Computer Science Elsevier B.V., vol. 117, pp. 30-37, 2017.
[18]
R. Poli, J. Kennedy, and T. Blackwell, "Particle Swarm Optimization: An Overview", Swarm Intell., vol. 1, no. 1, pp. 33-57, 2007.
[http://dx.doi.org/10.1007/s11721-007-0002-0]
[19]
M. Mitchell, An Introduction to Genetic Algorithms., MIT Press: Cambridge, MA, USA, 1996.
[20]
H. Nazif, and L.L. Soon, "Optimised crossover genetic algorithm for capacitated vehicle routing problem", Appl. Math. Model., vol. 36, no. 5, pp. 2110-2117, 2012.
[http://dx.doi.org/10.1016/j.apm.2011.08.010]
[21]
N. Soni, and T. Kumar, "Study of Various Mutation Operators in Genetic Algorithms (IJCSIT)", Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 3, pp. 4519-4521, 2014.
[22]
M. El-Haj, and R. Koulali, KALIMAT a Multipurpose Arabic Corpus", Second Workshop on Arabic Corpus Linguistics (WACL-2), 2013.https://sourceforge.net/projects/kalimat/
[23]
M. Hadni, S.A. Ouatik, A. Lachkar, and M. Meknassi, "Improving Rule-Based Method for Arabic POS Tagging Using HMM Technique", In: Second International Conference on Advanced Information Technologies and Applications, 2013, pp. 257-269. Dubai
[http://dx.doi.org/10.5121/csit.2013.3821]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy