Parts of Speech Tagger for Pali Language

Parts of Speech tagging is the process of labelling the words in the text with their appropriate labels. The labels assigned are noun, verb, adjective, adverb, pronoun... etc. For performing natural language processing, Parts of Speech tagging is an essential requirement. It is very simple statistical model for many Natural Language Processing applications. In this paper, we propose a parts of speech tagger for Pali language. Pali though considered as extinct, has very rich literature comprising works on Logic, History, Medicine, Pharmacology etc. It is an Indo-Aryan language. The general approach used for development of Pali tagger is a Rule based approach. It also presents the tagset used for Pali language. The paper shows the performance of proposed Rule based tagger for a dataset up to 300 sentences / 1000 words. The learning algorithms Support Vector Machine and Decision Tree have been used for measuring the performance on Pali tagged corpus.

Located in: Pali