Revised N-Gram based Automatic Spelling Correction Tool to Improve Retrieval Effectiveness

Ahmed, Farag; Luca, Ernesto William De; Nürnberger, Andreas

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Polibits

versión On-line ISSN 1870-9044

Polibits no.40 México jul./dic. 2009

Special section: Information Retrieval and Natural Language Processing

Revised N–Gram based Automatic Spelling Correction Tool to Improve Retrieval Effectiveness

Farag Ahmed¹, Ernesto William De Luca², and Andreas Nürnberger¹

¹ Data and Knowledge Engineering Group, Institute for Knowledge and Language Engineering, Otto–von–Guericke University of Magdeburg, Germany.

² Compentence Center Information Retrieval & Machine Learning Distributed Artificial Intelligence Laboratory, Technical University of Berlin, Germany.

Manuscript received October 23, 2008.
Manuscript accepted for publication August 22, 2009.

Abstract

We present a language–independent spell–checker that is based on an enhancement of the n–gram model. The spell checker is proposing correction suggestions by selecting the most promising candidates from a ranked list of correction candidates that is derived based on n–gram statistics and lexical resources. Besides motivating and describing the developed techniques, we briefly discuss the use of the proposed approach in an application for keyword– and semantic–based search support. In addition, the proposed tool was compared with state–of–the–art spelling correction approaches. The evaluation showed that it outperforms the other methods.

Key words: Spelling correction, n–gram, information retrieval effectiveness.

DESCARGAR ARTÍCULO EN FORMATO PDF

REFERENCES

[1] K. Kukich, "Techniques for automatically correcting words in text," ACM Computing Surveys, 24(4), 377–439, 1992. [ Links ]

[2] F. J. Damerau, "A technique for computer detection and correction of spelling errors," Communications of ACM, 7(3):171–176.7, 1964. [ Links ]

[3] W. Peters, "Lexical Resources," NLP group, Dept. of Comp. Sc., Uni. of Sheffield, 2001. [ Links ]

[4] R. A. Wagner and M. J. Fisher, "The string to string correction problem," Journal of Assoc. Comp. Mach., 21(1):168–173, 1974. [ Links ]

[5] A. Stanier, "How accurate is Soundex matching?" Comp. in Genealogy, vol. 3:7, 1990. [ Links ]

[6] C. Fellbaum, "WordNet, an electronical lexical database," Cambridge, MIT Press, 1998. [ Links ]

[7] E. Pianta, L. Bentivogli, and C. Girardi, "MultiWordNet: developing an aligned multilingual database," in Proc. of 1st Int. Conf. on Global WordNet, 2002. [ Links ]

[8] C. E. Shannon, "Prediction and entropy of printed English," Bell Sys. Tec. J. (30):50–64, 1951. [ Links ]

[9] U. Pfeifer, "Retrieval Effectiveness of Proper Name Search Methods," Information Processing and Management, 32(6):667–679, 1996. [ Links ]

[10] E. J. Yannakoudakis and D. Fawthrop, "An intelligent spelling error corrector," Information Processing and Management, 19:1, 101–108, 1983. [ Links ]

[11] T. N. Turba, "Length–segmented lists," Comm. of the ACM, 25:8, pp 522–526, 1982. [ Links ]

[12] Wikipedia, list of Common Misspelling Word List, http://en.wikipedia.org/wiki/Wikipedia: List_of_common_misspellings, 05.10.2006. [ Links ]

[13] B. Martins, M. J. Silva, "Spelling Correction for Search Engine Queries," in EsTAL – España for Natural Language Processing, Alicante, Spain, 2004. [ Links ]

[14] K. Church and W. A. Gale, "Probability scoring for spelling correction," Statistics and Computing, Vol. 1, No. 1, pp. 93–103, 1991. [ Links ]

[15] V. J. Hodge and J. Austin, "A comparison of standard spell checking algorithms and novel binary neural approach," IEEE Trans. Know. Dat. Eng., Vol. 15:5, pp. 1073–1081, 2003. [ Links ]

[16] J. J. Pollock and A. Zamora, "Collection and characterization of spelling errors in scientific and scholarly text," Journal Amer. Soc. Inf. Sci., Vol. 34, No. 1, pp. 51–58, 1983. [ Links ]

[17] ––––––––––, "Automatic spelling correction in scientific and scholarly text," Comm. ACM, Vol. 27, No. 4, pp. 358–368, 1984. [ Links ]

[18] E. Brill and R. C. Moore, "An improved error model for noisy channel spelling correction," in Proc. 38th Annual Meet. of the Assoc. for Comp. Ling., Hong Kong, 2000, pp. 286–293. [ Links ]

[19] K. Toutanova and R. C. Moore, "Pronunciation modeling for improved spelling correction," in Proc. 40th Annual Meeting of the Assoc. for Comp. Ling, Hong Kong, 2002, pp. 144–151. [ Links ]

[20] Jin–ming Zhan, Xiaolong Mou, Shuqing Li, Ditang Fang, "A Language Model in a Large–Vocabulary Speech Recognition System," in Proc. of Int. Conf. ICSLP98, Sydney, Australia, 1998. [ Links ]

[21] S. Deorowicz and M. G. Ciura, "Correcting Spelling Errors by Modelling Their Causes," Int. Journal of Applied Mathematics and Computer Science, 15(2):275–285, 2005. [ Links ]

[22] B. Khaltar, A. Fujii, and T. Ishikawa, "Extracting loanwords from Mongolian corpora and producing a Japanese–Mongolian bilingual dictionary," in Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, Sydney, Australia: ACL. Pages: 657 – 664, 2006. [ Links ]

[23] E. W. De Luca and A. Nürnberger, "Using Clustering Methods to Improve Ontology–Based Query Term Disambiguation," International Journal of Intelligent Systems, 21:693–709, 2006. [ Links ]

[24] E. W. De Luca and A. Nürnberger, "Rebuilding Lexical Resources for Information Retrieval using Sense Folder Detection and Merging Methods,"in: Proc. of the 5th Int. Conf. on Language Resources and Evaluation (LREC 2006), 2006. [ Links ]

[25] E. W. De Luca, "Semantic Support in Multilingual Text Retrieval," Shaker Verlag, Aachen, Germany, 2008. [ Links ]