Abstract
Background: Parsing English language achieved effective results over the last few decades. However, parsing a difficult language such as Arabic represents a major challenge at the present, since it is characterized by the rich morphology and contains complex linguistic characteristics not found in other languages. Although parsing systems for Arabic have been developed recently, however, most of them do not support any deeper processing for the Arabic sentences such as providing an effective dependencies analysis to identify, for example, the relative clauses in these sentences.
Objective: This paper develops a new framework and system that support the process of parsing Arabic sentences and writing well-formed Arabic relative clauses.
Method: The developed framework is applied to learn the grammar rules for Arabic relative clauses based on the use of machine learning, in particular, Inductive Logic Programming (ILP). A corpus of Arabic relative sentences was generated from Quran and used in the experiments made in this research. The sentences in this corpus were firstly processed by using the Natural Language Processing (NLP) toolkit called Stanford coreNLP and then given to the ILP system ALEPH to automatically learn a grammar for Arabic relative clauses. A system was developed to extract Arabic relative clauses from Arabic sentences based on the rules produced by ALEPH.
Results: An empirical evaluation of the developed system was carried out and achieved promising results with an overall accuracy of 83%.
Conclusion: Our results lead to conclude that the developed system is able to perform a deeper dependency parsing for Arabic text as well as it can identify relative clauses in Arabic sentences.
Keywords: Natural language processing, Arabic language processing, Arabic parsing, machine learning, inductive logic programming, dependency parsing, Arabic relative clauses.
Graphical Abstract