Generic placeholder image

Recent Advances in Computer Science and Communications

Editor-in-Chief

ISSN (Print): 2666-2558
ISSN (Online): 2666-2566

Research Article

Deep Learning Based Deep Level Tagger for Malayalam

Author(s): Ajees A. P.* and Sumam M. Idicula

Volume 14, Issue 1, 2021

Published on: 04 February, 2019

Page: [181 - 191] Pages: 11

DOI: 10.2174/2213275912666190204133657

Price: $65

Abstract

Background: POS tagging is the process of identifying the correct grammatical category of words based on its meaning and context in a text document. It is one of the preliminary steps in the processing of natural language text. If any error happens in POS tagging the same will be propagated to whole NLP applications. Hence it must be handled in a genuine and precise way.

Aim: The purpose of this study is to develop a deep level tagger for Malayalam which indicates the semantics of nouns and verbs in a text document.

Methods: The proposed model is a two-tier architecture consisting of deep learning as well as rulebased approaches. The first tier consists of a tagging model, which is trained by a tagged corpus of 287,000 words. To improve the depth of tagging a suffix stripper is also used which can provide morhological features to the shallow machine learning model.

Results: The system is trained on 2,30,000 words and tested on 57,000 words. The accuracy of tagging for the phase-1 architecture is 92.03%. Similarly the accuracy of phase-2 architecture is 98.11%. The overall accuracy of tagging is 91.82%.

Conclusion: The exclusive feature of the proposed tagger is its depth in tagging the noun words. This deep level information can be used in various semantic processing applications of the natural language text like anaphora resolution, text summarization, machine translation, etc.

Keywords: POS tagging, malayalam, deep level tagging, LSTM, sequence-to-sequence learning, word embeddings, MLP.

Graphical Abstract


Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy