Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Polibits
versión On-line ISSN 1870-9044
Polibits no.40 México jul./dic. 2009
Special section: Information Retrieval and Natural Language Processing
Revised NGram based Automatic Spelling Correction Tool to Improve Retrieval Effectiveness
Farag Ahmed1, Ernesto William De Luca2, and Andreas Nürnberger1
1 Data and Knowledge Engineering Group, Institute for Knowledge and Language Engineering, OttovonGuericke University of Magdeburg, Germany.
2 Compentence Center Information Retrieval & Machine Learning Distributed Artificial Intelligence Laboratory, Technical University of Berlin, Germany.
Manuscript received October 23, 2008.
Manuscript accepted for publication August 22, 2009.
Abstract
We present a languageindependent spellchecker that is based on an enhancement of the ngram model. The spell checker is proposing correction suggestions by selecting the most promising candidates from a ranked list of correction candidates that is derived based on ngram statistics and lexical resources. Besides motivating and describing the developed techniques, we briefly discuss the use of the proposed approach in an application for keyword and semanticbased search support. In addition, the proposed tool was compared with stateoftheart spelling correction approaches. The evaluation showed that it outperforms the other methods.
Key words: Spelling correction, ngram, information retrieval effectiveness.
DESCARGAR ARTÍCULO EN FORMATO PDF
REFERENCES
[1] K. Kukich, "Techniques for automatically correcting words in text," ACM Computing Surveys, 24(4), 377439, 1992. [ Links ]
[2] F. J. Damerau, "A technique for computer detection and correction of spelling errors," Communications of ACM, 7(3):171176.7, 1964. [ Links ]
[3] W. Peters, "Lexical Resources," NLP group, Dept. of Comp. Sc., Uni. of Sheffield, 2001. [ Links ]
[4] R. A. Wagner and M. J. Fisher, "The string to string correction problem," Journal of Assoc. Comp. Mach., 21(1):168173, 1974. [ Links ]
[5] A. Stanier, "How accurate is Soundex matching?" Comp. in Genealogy, vol. 3:7, 1990. [ Links ]
[6] C. Fellbaum, "WordNet, an electronical lexical database," Cambridge, MIT Press, 1998. [ Links ]
[7] E. Pianta, L. Bentivogli, and C. Girardi, "MultiWordNet: developing an aligned multilingual database," in Proc. of 1st Int. Conf. on Global WordNet, 2002. [ Links ]
[8] C. E. Shannon, "Prediction and entropy of printed English," Bell Sys. Tec. J. (30):5064, 1951. [ Links ]
[9] U. Pfeifer, "Retrieval Effectiveness of Proper Name Search Methods," Information Processing and Management, 32(6):667679, 1996. [ Links ]
[10] E. J. Yannakoudakis and D. Fawthrop, "An intelligent spelling error corrector," Information Processing and Management, 19:1, 101108, 1983. [ Links ]
[11] T. N. Turba, "Lengthsegmented lists," Comm. of the ACM, 25:8, pp 522526, 1982. [ Links ]
[12] Wikipedia, list of Common Misspelling Word List, http://en.wikipedia.org/wiki/Wikipedia: List_of_common_misspellings, 05.10.2006. [ Links ]
[13] B. Martins, M. J. Silva, "Spelling Correction for Search Engine Queries," in EsTAL España for Natural Language Processing, Alicante, Spain, 2004. [ Links ]
[14] K. Church and W. A. Gale, "Probability scoring for spelling correction," Statistics and Computing, Vol. 1, No. 1, pp. 93103, 1991. [ Links ]
[15] V. J. Hodge and J. Austin, "A comparison of standard spell checking algorithms and novel binary neural approach," IEEE Trans. Know. Dat. Eng., Vol. 15:5, pp. 10731081, 2003. [ Links ]
[16] J. J. Pollock and A. Zamora, "Collection and characterization of spelling errors in scientific and scholarly text," Journal Amer. Soc. Inf. Sci., Vol. 34, No. 1, pp. 5158, 1983. [ Links ]
[17] , "Automatic spelling correction in scientific and scholarly text," Comm. ACM, Vol. 27, No. 4, pp. 358368, 1984. [ Links ]
[18] E. Brill and R. C. Moore, "An improved error model for noisy channel spelling correction," in Proc. 38th Annual Meet. of the Assoc. for Comp. Ling., Hong Kong, 2000, pp. 286293. [ Links ]
[19] K. Toutanova and R. C. Moore, "Pronunciation modeling for improved spelling correction," in Proc. 40th Annual Meeting of the Assoc. for Comp. Ling, Hong Kong, 2002, pp. 144151. [ Links ]
[20] Jinming Zhan, Xiaolong Mou, Shuqing Li, Ditang Fang, "A Language Model in a LargeVocabulary Speech Recognition System," in Proc. of Int. Conf. ICSLP98, Sydney, Australia, 1998. [ Links ]
[21] S. Deorowicz and M. G. Ciura, "Correcting Spelling Errors by Modelling Their Causes," Int. Journal of Applied Mathematics and Computer Science, 15(2):275285, 2005. [ Links ]
[22] B. Khaltar, A. Fujii, and T. Ishikawa, "Extracting loanwords from Mongolian corpora and producing a JapaneseMongolian bilingual dictionary," in Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, Sydney, Australia: ACL. Pages: 657 664, 2006. [ Links ]
[23] E. W. De Luca and A. Nürnberger, "Using Clustering Methods to Improve OntologyBased Query Term Disambiguation," International Journal of Intelligent Systems, 21:693709, 2006. [ Links ]
[24] E. W. De Luca and A. Nürnberger, "Rebuilding Lexical Resources for Information Retrieval using Sense Folder Detection and Merging Methods,"in: Proc. of the 5th Int. Conf. on Language Resources and Evaluation (LREC 2006), 2006. [ Links ]
[25] E. W. De Luca, "Semantic Support in Multilingual Text Retrieval," Shaker Verlag, Aachen, Germany, 2008. [ Links ]