Servicios Personalizados
Citado por SciELO
Links relacionados
Similares en SciELO
versión On-line ISSN 1870-9044
Polibits no.46 México jul./dic. 2012
Lexical Disambiguation of Arabic Language: An Experimental Study
Laroussi Merhben, Anis Zouaghi, and Mounir Zrigui
Unité de Recherche en Technologies de l'Information et de la Communication of the Réseau National Universitaire Tunisien, Tunisia (email:;;
Manuscript received June 18, 2012.
Manuscript accepted for publication July 24, 2012.
In this paper we test some supervised algorithms that most of the existing related works of word sense disambiguation have cited. Due to the lack of linguistic data for the Arabic language, we work on nonannotated corpus and with the help of four annotators; we were able to annotate the different samples containing the ambiguous words. Since that, we test the Naïve Bayes algorithm, the decision lists and the exemplar based algorithm. During the experimental study, we test the influence of the window size on the disambiguation quality, the derivation and the technique of smoothing for the (2n+1)grams. For these tests the exemplar based algorithm achieves the best rate of precision.
Key words: Supervised algorithms, training data, Naïve Bayes, decision list, exemplar based algorithm, window size.
[1] R. Mihalcea, "Word Sense Disambiguation Using Pattern Learning and Automatic Feature Selection", in Journal of Natural Language and Engineering (JNLE), December 2002, p.p: 348358. [ Links ]
[2] H. T. Ng and H. B. Lee, "Integrating multiple knowledge sources to disambiguate word senses: An examplarbased approach". In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, 1996, p.p: 4047. [ Links ]
[3] L. AlSulaiti, E. Atwell, "The design of a corpus of contemporary Arabic". International Journal of Corpus Linguistics, vol. 11, 2006, pp. 135171. [ Links ]
[4] M. Ben Mukarram and alIfriqi alMisri ibn MANZUR, " Lisàn al'arab", Ibn Manzûr, 15 volumes, 1956, Beyrout. [ Links ]
[5] J. Savoy, Y. Rasolofo, "Report on the TREC11 Experiment: Arabic, Named Page and Topic Distillation Searches". Eleventh Text Retrival Conference TREC, 2002. [ Links ]
[6] C. Fox, "A stop list for general text". SIGIR Forum, 1990, Vol. 24, No. 12, pp. 1935. [ Links ]
[7] A. Chen, F. Gey, translation Term Weighting and Combining Translation Resources in CrossLanguage retrieval, Tenth text retrieval conference, 2001, TREC. [ Links ]
[8] S. Gerard, M.J. McGill, "Introduction to modern information retrieval", ISBN: 0070544840, 1983, p.p: 448. [ Links ]
[9] K. Shereen and G. Roland, "Stemming Arabic text", Computer Science Department, Lancaster University, Lancaster, UK, 1999. [ Links ]
[10] R. Navigili, "Word Sense Disambiguation: A Survey". ACM Computing Surveys, Vol. 41, No. 2, Article 10, Publication date: February 2009. [ Links ]
[11] T. Pedersen, "Learning probabilistic models of word sense disambiguation", Ph.D. dissertation. Southern Methodist University, Dallas, TX. 1998. [ Links ]
[12] D. Yarowsky, "Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French". In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (Las Cruces, NM), 1994, p.p: 8895. [ Links ]
[13] A. Zouaghi, L. Merhbene, M. Zrigui, "Word Sense disambiguation for Arabic language using the variants of the Lesk algorithm", WORLDCOMP'11, Las Vegas, juillet 2011, p.p. 561567. [ Links ]
[14] D. Yarowsky, "One sense per collocation". In Proceedings of the ARPA Workshop on Human Language Technology, Princeton,1993, pp. 2667. [ Links ]
[15] M. Diab and P. Resnik, "An unsupervised method for word sense tagging using parallel corpora". Proceedings of the ACL40th Meeting of the Association for Computational Linguistics, Philadelphia, U.S.A. 2002, pp. 255262. [ Links ]
[16] S. Elmougy, H. Taher and H. Noaman "Naïve Bayes Classifier for Arabic Word Sense Disambiguation". In proceeding of the 6th International Conference on Informatics and Systems, 2008, pp: 1621. [ Links ]
[17] M. Soha Eid, et al., "Comparative Study of Rocchio Classifier Applied to supervised WSD Using Arabic Lexical Samples". Proceedings of the tenth conference of language engeneering (SEOLEC'2010), Cairo, Egypt, December 1516, 2010. [ Links ]
[18] C. Leacock, G. Towell and E. Voorhees, "Corpus based statistical sense resolution". In Proceedings of the ARPA Workshop on Human Language Technology, 1993, p.p. 260265. [ Links ]
[19] R.J. Mooney, "Comparative experiments on disambiguating word senses: An illustration of the role of bias in machine learning. Proceedings of EMNLP, 1996, p.p: 8291. [ Links ]
[20] T. Pedersen, "Learning Probabilistic Models of Word Sense Disambiguation". Ph.D. Dissertation. Southern Methodist University, 1998. [ Links ]
[21] A. Zouaghi, L. Merhbene and M. Zrigui, "Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation". Journal Article published in the Artificial Intelligence, Online First, 30 May 2011, Review; DOI: 10.1007/s1046201192493. [ Links ]
[22] L. Merhbene, A. Zouaghi and M. Zrigui, Ambiguous Arabic Words Disambiguation. In Proceeding of The 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD'10), The University of Greenwich, London, United Kingdom, 911 June, 2010, p.p. 157164. [ Links ]