Deep Learning Based Deep Level Tagger for Malayalam

Ajees       A. P.; Sumam    M.   Idicula

doi:10.2174/2213275912666190204133657

Abstract

Background: POS tagging is the process of identifying the correct grammatical category of words based on its meaning and context in a text document. It is one of the preliminary steps in the processing of natural language text. If any error happens in POS tagging the same will be propagated to whole NLP applications. Hence it must be handled in a genuine and precise way.

Aim: The purpose of this study is to develop a deep level tagger for Malayalam which indicates the semantics of nouns and verbs in a text document.

Methods: The proposed model is a two-tier architecture consisting of deep learning as well as rulebased approaches. The first tier consists of a tagging model, which is trained by a tagged corpus of 287,000 words. To improve the depth of tagging a suffix stripper is also used which can provide morhological features to the shallow machine learning model.

Results: The system is trained on 2,30,000 words and tested on 57,000 words. The accuracy of tagging for the phase-1 architecture is 92.03%. Similarly the accuracy of phase-2 architecture is 98.11%. The overall accuracy of tagging is 91.82%.

Conclusion: The exclusive feature of the proposed tagger is its depth in tagging the noun words. This deep level information can be used in various semantic processing applications of the natural language text like anaphora resolution, text summarization, machine translation, etc.

Keywords: POS tagging, malayalam, deep level tagging, LSTM, sequence-to-sequence learning, word embeddings, MLP.

Graphical Abstract

Rights & Permissions Print Cite

Article Metrics

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/2213275912666190204133657	Print ISSN 2666-2558
Publisher Name Bentham Science Publisher	Online ISSN 2666-2566

Recent Advances in Computer Science and Communications

Deep Learning Based Deep Level Tagger for Malayalam

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract