Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Computación y Sistemas
versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546
Comp. y Sist. vol.14 no.2 Ciudad de México oct./dic. 2010
Artículos
Un método independiente del idioma para responder preguntas de definición
An Independent Language Method for Answer Definition Questions
Claudia Denicia Carral, Luis Villaseñor Pineda, Manuel Montes y Gómez
Laboratorio de Tecnologías del Lenguaje, Coordinación de Ciencias Computacionales, Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE). Tonatzintla, Puebla, México. Email: cdenicia@inaoep.mx, villasen@inaoep.mx, mmontesg@inaoep.mx
Artículo recibido en Junio 6, 2007.
Aceptado en Abril 17, 2009.
Resumen
Este trabajo describe un método para responder preguntas de definición basado exclusivamente en patrones léxicos brindando con ello independencia sobre el idioma. El método aplica dos pasos de minería de texto. El primer paso se enfoca en el descubrimiento de un conjunto de patrones léxicos superficiales a partir de ejemplos de definiciones recuperados de la Web. Posteriormente, se usan los patrones descubiertos para extraer una colección de pares conceptodescripción de una colección de documentos dada. El segundo paso de minería se aplica para determinar la respuesta más adecuada para cierta pregunta específica. Los resultados experimentales se obtuvieron con datos del foro CLEF 2005 y 2006 en tareas monolingües para el español, francés e italiano. Dichos resultados demuestran la pertinencia del método alcanzando altas precisiones para los tres idiomas.
Palabras clave: H. Sistemas de Información, H.3 Almacenamiento y Recuperación de Información, H.3.4 Sistemas y Software, Sistemas de Búsqueda de Respuestas, Preguntas de Definición.
Abstract
This paper describes a method for answering definition questions that is exclusively based on the use of lexical patterns, and, therefore, that is language independent. This method applies two main textmining steps. The first step focuses on the discovery of a set of surface lexical patterns from definition examples downloaded from the Web. Subsequently, it uses these patterns to extract a set of conceptdescription pairs from a given target document collection. The second step applies a textmining algorithm to determine the most adequate answer to each specific question. Experimental results were obtained using the datasets from the CLEF 2005 and 2006 for the monolingual tasks in Spanish, French and Italian. These results demonstrate the relevance of the method which showed very high precisions for the three languages.
Keywords: H. Information Systems, H.3 Information Storage and Retrieval, H.3.4 Systems and Software, QuestionAnswering Systems, Definition Questions.
DESCARGAR ARTÍCULO EN FORMATO PDF
Agradecimientos
Los autores agradecen a Alberto Téllez, Antonio Juárez, Esaú Villatoro y a Manuel Alberto Pérez por su valiosa participación en las tareas de desarrollo del sistema participante en las evaluaciones CLEF 2005 y 2006. Este trabajo fue realizado gracias al apoyo del CONACYT (Proyecto No. Ref. 43990 y la beca 189692) y del SNIMéxico. Los autores también agradecen a la agencia EFE y al CLEF por los recursos prestados y las tareas de evaluación de este trabajo.
Referencias
1. AhonenMyka H. (2002). Discovery of Frequent Word Sequences in Text Source. Pattern Detection and Discovery. Lecture Notes in Artificial Intelligence, 2447, 180189. [ Links ]
2. Cui H., Kan M. & Chua T. (2004). Unsupervised Learning of Soft Patterns for Generating Definitions from Online News. 13th International Conference on World Wide Web, New York, USA. 9099. [ Links ]
3. Cui H. Kan M. & Chua T. (2005). Generic Soft Pattern Models for Definitional Question Answering. 28th Annual International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR 2005), Salvador, Brazil, 384391. [ Links ]
4. DeniciaCarral C., MontesyGómez M., VillaseñorPineda L. & GarcíaHernández, R. (2006). A Text Mining Approach for Definition Question Answering. 5th International Conference on Natural Language Processing (FinTal 2006), Lecture Notes in Computer Science, 4139, 7686. [ Links ]
5. Fleischman M., Hovy E. & Echihabi A. (2003). Offline Strategies for Online Question Answering: Answering Question Before they are Asked. 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Sapporo, Japan, 17. [ Links ]
6. GarcíaHernández, R., MartínezTrinidad, F. & CarrascoOchoa, A. (2004). A Fast Algorithm to find All Maximal Frequent Sequences in a Text. 9th Iberoamerican Congress on Pattern Recognition, CIARP 2004. Lecture Notes in Computer Science, 3287, 478486. [ Links ]
7. Girju R. (2003). Automatic Detection of Causal Relations for Question Answering. 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), Sapporo, Japan, 7683. [ Links ]
8. Greenwood M. & Saggion H. (2004). A pattern Based Approach to Answering Factoid, List and Definition Questions. 7th International Conference "Recherche d'Information Assistée par Ordinateur" (RIAO'04), Avignon, France, 232243 [ Links ]
9. Greisdorf, H. (2003). Relevance thresholds: a multistage predictive model of how users evaluate information. Information Processing and Management. 39 (3), 403423. [ Links ]
10. Hearst, M. A. (1992). Automatic Acquisition of Hyponyms on Large Text Corpora. International Conference on Computational Linguistics (COLING92), Nantes, France, 2328. [ Links ]
11. Hildebrandt W., Katz B. & Lin J. (2004). Answering Definition Questions Using Multiple Knowledge Sources. Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLTNAACL 2004), Boston, USA, 4956. [ Links ]
12. Jijkoun V., De Rijke M. & Mur J. (2004). Information Extraction for Question Answering: Improving Recall through Syntactic Patterns. International Conference on Computational Linguistics (COLING 2004). Geneva, Switzerland, 12841290. [ Links ]
13. Katz B., Lin J., Loreto D., Hildebrant, W., Bilotti M., Fernandes A., Marton G. & Mora F. (2003). Integrating Webbased and Corpusbased Techniques for Question Answering. 12th Text REtrieval Conference (TREC12), Washington, USA, 426435. [ Links ]
14. Liaw, S. & Huang, H. (2003). An Investigation of User Attitudes toward Search Engines as an Information Retrieval Tool. Computers in Human Behavior, 19(6), 751765. [ Links ]
15. Laurent, D., Séguéla, P. & Nègre, S. (2010). Cross Lingual Question Answering using QRISTAL for CLEF 2006. Evaluation of Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science, 4730, 339350. [ Links ]
16. Magnini B., Romagnoli S., Vallin A., Herrera J., Peñas A., Peinado V., Verdejo F. & Rijke M. (2004). The Multiple Language Question Answering Track at CLEF 2003. Comparative Evaluation of Multilingual Information Access Systems. Lecture Notes in Computer Science, 3237, 471486. [ Links ]
17. Magnini B., Vallin A., Ayache C., Erbach G., Peñas A., Rijke M., Rocha P., Simov K. & Sutcliffe R. (2005). Overview of the CLEF 2004 Multilingual Question Answering Track. Multilingual Information Access for Text, Speech and Images. Lecture Notes in Computer Science, 3491, 371391. [ Links ]
18. Magnini, B., Giampiccolo, D., Forner, P., Ayache, C., Jijkoun, V., Osenova, P., Peñas, A., Rocha, P., Sacaleanu, B., & Sutcliffe, R. (2010). Overview of the CLEF 2006 Multilingual Question Answering Track. Evaluation of Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science, 4730, 223256 [ Links ]
19. MontesyGómez, M., VillaseñorPineda, L., PérezCoutiño, M., GómezSoriano, J. M., SanchisArnal, E. & Rosso, P. (2006). A Full DataDriven System for Multiple Language Question Answering. Accessing Multilingual Information Repositories. Lecture Notes in Computer Science, 4022, 420428. [ Links ]
20. Pantel, P., Ravichandran, D. & Hovy, E. (2004). Towards Terascale Knowledge Acquisition. International Conference on Computational Linguistics (COLING04), Geneva, Switzerland, 771777. [ Links ]
21. Peters C. (2005). What happened in CLEF 2004. Multilingual Information Access for Text, Speech and Images. Lecture Notes in Computer Science, 3491, 19. [ Links ]
22. Ravichandran D., Hovy E. (2002). Learning Surface Text Patterns for a Question Answering System. 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, USA, 4147. [ Links ]
23. Ravichandran D., Ittycheriah A. & Roukos S. (2003). Automatic Derivation of Surface Text Patterns for a Maximum Entropy Based Question Answering System. Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLTNAACL 2003), Edmonton, Canada, 8587. [ Links ]
24. Roussinov D. & Robles J. (2004). Web Question Answering Through Automatically Learned Patterns. Joint ACM/IEEE Conference on Digital Libraries, Tucson, USA, 347348. [ Links ]
25. Saggion H. (2004). Identifying Definitions in Text Collections for Question Answering. 4th International Conference on Language Resources and Evaluation, Lisboa, Portugal, 19271930. [ Links ]
26. Saggion, H. & Gaizauskas, R. (2004). Mining online sources for definition knowledge. 17th International FLorida Artificial Intelligence Research Society Conference (FLAIRS 2004), Miami, USA, 6166. [ Links ]
27. Soubbotin, M.M. & Soubbotin, S.M. (2001). Patterns of Potential Answer Expressions as Clues to the Right Answer. Tenth Text REtrieval Conference. Gaithersburg, USA, 175182. [ Links ]
28. Téllez, A., Juárez, A., Hernández G., Denicia C., Villatoro E., Montes M., & Villaseñor, L. (2008). A Lexical Approach for Spanish Question Answering. Advances in Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science, 5152, 328331. [ Links ]
29. Vallin, A., Magnini, B., Giampiccolo, D., Aunimo, L., Ayache, C., Osenova, P., Peñas, A., de Rijke, M., Sacaleanu, B., Santos, D. & Sutcliffe, R. (2006). Overview of the CLEF 2005 Multilingual Question Answering Track. Accessing Multilingual Information Repositories. Lecture Notes in Computer Science, 4022, 307331. [ Links ]
30. Vicedo, J.L., Rodríguez, H., Peñas, A. & Massot, M. (2003). Los sistemas de Búsqueda de Respuestas desde una perspectiva actual Procesamiento del Lenguaje Natural, 31, 351367. [ Links ]
31. Voorhees E. (1999). The TREC8 Question Answering Track Report, 8th Text REtrieval Conference (TREC8), Gaithersburg, USA, 7782. [ Links ]
32. Voorhees E. & Dawn T. (1999). The TREC8 Question Answering Track Evaluation. 8th Text REtrieval Conference (TREC8), Gaithersburg, USA, 83105. [ Links ]
33. Yang, H. & Yoo, Y. (2004). It's All About Attitude: Revisiting the Technology Acceptance Model. Decision Support Systems. 38(1), 1931. [ Links ]
34. Wu M., Zheng X., Duan M., Liu T. & Tomek S. (2003). Question Answering By Pattern Matching, Web Proofing, Semantic Form Proofing. 12th Text REtrieval Conference (TREC12), Washington, USA, 578586. [ Links ]