SciELO - Scientific Electronic Library Online

 
 número49Sistema de medición de distancia mediante imágenes para determinar la posición de una esfera utilizando el sensor Kinect XBOXComputing Polynomial Segmentation through Radial Surface Representation índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Polibits

versión On-line ISSN 1870-9044

Polibits  no.49 México ene./jun. 2014

 

Information Extraction in Semantic, Highly-Structured, and Semi-Structured Web Sources

 

Víctor M. Alonso-Rorís, Juan M. Santos Gago, Roberto Pérez Rodríguez, Carlos Rivas Costa, Miguel A. Gómez Carballa, and Luis Anido Rifón

 

Department of Telematics, University of Vigo, Spain (e-mails: victor.roris@det.uvigo.es, Juan.Santos@det.uvigo.es, carlosrivas@det.uvigo.es, miguelgomez@det.uvigo.es, lanido@det.uvigo.es, robertoperezrodriguez@gmail.com).

 

Manuscript received on January 7, 2014
Accepted for publication on February 28, 2014.

 

Abstract

The evolution of the Web from the original proposal made in 1989 can be considered one of the most revolutionary technological changes in centuries. During the past 25 years the Web has evolved from a static version to a fully dynamic and interoperable intelligent ecosystem. The amount of data produced during these few decades is enormous. New applications, developed by individual developers or small companies, can take advantage of both services and data already present on the Web. Data, produced by humans and machines, may be available in different formats and through different access interfaces. This paper analyses three different types of data available on the Web and presents mechanisms for accessing and extracting this information. The authors show several applications that leverage extracted information in two areas of research: recommendations of educational resources beyond content and interactive digital TV applications.

Key words: Information extraction, web data processing, semantic enrichment, data mining, web scraping.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

Acknowledgements

The work presented in this paper was partially supported by the European Regional Development Fund (ERDF); the Galician Regional Government under agreement for funding the Atlantic Research Center for Information and Communication Technologies (AtlanTIC); the Spanish Government and the European Regional Development Fund (ERDF) under project TACTICA; the European Commission's FP7 programme - project iTEC: innovative Technologies for an Engaging Classroom (Grant no. 257566); and the Spanish Ministry of Science and Innovation under grant "Methodologies, Architectures and Standards for adaptive and accessible e-learning (Adapt2Learn)" (TIN2010-21735-C02-01). The content of this paper is the sole responsibility of its authors and it does not represent the opinion of the European Commission, or the Spanish Ministry of Science and Innovation, which are not responsible of any use that might be made of the information contained herein.

 

References

[1] C. Bizer, J. Lehmann, G. Kovilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann, "CBpedia - A crystallization point for the web of data," Journal of Web Semantics, vol. 7, no. 3, pp. 154-165, 2009.         [ Links ]

[2] C. Chang, M. Kayed, M. Girgis, and K. Shaalan, "A survey on web information extraction systems," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, pp. 1411-1428, 2006.         [ Links ]

[3] E. Prud'hommeaux and A. Seaborne, "SPARQL Query Language for RDF," World Wide Web Consortium, W3C Recommendation, Tech. Rep., 2008.         [ Links ]

[4] G. Tummarello, R. Delbru, and E. Oren, "Sindice.com: weaving the open linked data," in 6th International Semantic Web Conference, 2007.         [ Links ]

[5] A. Harth, A. Hogan, R. Delbru, J. Umbrich, S. O'Riain, and S. Decker, "SWSE: Answers Before Links!" in 6th International Semantic Web Conference, 2007.         [ Links ]

[6] B. He, M. Patel, Z. Zhang, and K. Chang, "Accessing the deep web," Communications of the ACM, vol. 50, no. 5, pp. 94-101, 2007.         [ Links ]

[7] A. Canas Rodriguez, V. M. Alonso Roris, J. M. Santos Gago, L. E. Anido Rifon, and M. J. Fernandez Iglesias, "The iTEC-SDE recommendation algorithms," International Journal of Systems and Control, pp. 1-8, 2013.         [ Links ]

[8] A. Cañas Rodríguez, V. M. Alonso Rorís, J. M. Santos Gago, L. E. Anido Rifón, and M. J. Fernandez Iglesias, "Providing event recommendations in educational scenarios," in Management Intelligent Systems. Springer, 2013, pp. 91-98.         [ Links ]

[9] L. Anido, M. Caeiro, A. Cañas, M. Fernández, V. Alonso, and J. M. Santos, "ITEC - WP 10 D10.3. Support for implementing ITEC engaging scenarios V3," The European ITEC project homepage, EUN Partnership AISBL, Rue de Treves 61, B-1040 Brussels, Tech. Rep., 2013. [Online]. Available: http://itec.eun.org/c/document_library/get_file?uuid=1f1576cf-96b6-46bb-b34b-f66eca0f3cdf&groupId=10136        [ Links ]

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons