Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Polibits
versión On-line ISSN 1870-9044
Polibits no.38 México jul./dic. 2008
Special section: natural language processing
Morpheme based Language Model for Tamil PartofSpeech Tagging
S. Lakshmana Pandian and T. V. Geetha
Department of Computer Science and Engineering, Anna University, Chennai, India. (lpandian72@yahoo.com).
Manuscript received May 12, 2008.
Manuscript accepted for publication October 25, 2008.
Abstract
The paper describes a Tamil Part of Speech (POS) tagging using a corpusbased approach by formulating a Language Model using morpheme components of words. Rule based tagging, Markov model taggers, Hidden Markov Model taggers and transformationbased learning tagger are some of the methods available for part of speech tagging. In this paper, we present a language model based on the information of the stem type, last morpheme, and previous to the last morpheme part of the word for categorizing its part of speech. For estimating the contribution factors of the model, we follow generalized iterative scaling technique. Presented model has the overall Fmeasure of 96%.
Key words: Bayesian learning, language model, morpheme components, generalized iterative scaling.
DESCARGAR ARTÍCULO EN FORMATO PDF
REFERENCES
[1] Aniket Dalal, Kumar Nagaraj, Uma Sawant, Sandeep Shelke , Hindi PartofSpeech Tagging and Chunking : A Maximum Entropy Approach. In: Proceedings of the NLPAI Machine Learning Contest 2006 NLPAI, 2006. [ Links ]
[2] Nizar Habash , Owen Rambow ,Arabic Tokenization, PartofSpeech Tagging and Morphological Disambiguation in one Fell Swoop. In: Proceedings of the 43rd Annual Meeting of the ACL, pages 573580, Association for Computational Linguistics, June 2005. [ Links ]
[3] D. Hiemstra. Using language models for information retrieval. PhD Thesis, University of Twente, 2001. [ Links ]
[4] S. Armstrong, G. Robert, and P. Bouillon. Building a Language Model for POS Tagging (unpublished), 1996. http://citeseer.ist.psu.edu/armstrong96building.html [ Links ]
[5] P. Anandan, K. Saravanan, Ranjani Parthasarathi and T. V. Geetha. Morphological Analyzer for Tamil. In: International Conference on Natural language Processing, 2002. [ Links ]
[6] Thomas Lehman. A grammar of modern Tamil, Pondicherry Institute of Linguistic and culture. [ Links ]
[7] Sandipan Dandapat, Sudeshna Sarkar and Anupam Basu. A Hybrid Model for Partofspeech tagging and its application to Bengali. In: Transaction on Engineering, Computing and Technology VI December 2004. [ Links ]
[8] Barbara B. Greene and Gerald M. Rubin. Automated grammatical tagger of English. Department of Linguistics, Brown University, 1971. [ Links ]
[9] S. Klein and R. Simmons. A computational approach to grammatical coding of English words. JACM, 10:334337, 1963. [ Links ]
[10] Theologos Athanaselies, Stelios Bakamidis and Ioannis Dologlou. Word reordering based on Statistical Language Model. In: Transaction Engineering, Computing and Technology, v. 12, March 2006. [ Links ]
[11] Sankaran Baskaran. Hindi POS tagging and Chunking. In: Proceedings of the NLPAI Machine Learning Contest, 2006. [ Links ]
[12] Lluís Márquez and Lluis Padró. A flexible pos tagger using an automatically acquired Language model. In: Proceedings of ACL/EACL'97. [ Links ]
[13] K. Rajan. Corpus analysis and tagging for Tamil. In: Proceeding of symposium on Translation support system STRANS2002 [ Links ]
[14] T. Brants. TnT A Statistical PartofSpeech Tagger. User manual, 2000. [ Links ]
[15] Scott M. Thede and Mary P. Harper. A secondorder Hidden Markov Model for partofspeech tagging. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 175182. Association for Computational Linguistics, June 2026, 1999. [ Links ]
[16] Eric Brill. TransformationBased ErrorDriven Learning and Natural Language Processing: A Case Study in PartofSpeech Tagging. Computation Linguistics, 21(4):543 565, 1995. [ Links ]