SciELO - Scientific Electronic Library Online

 
 número39Semantic Enterprise Search (but no Web 2.0) índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Polibits

versão On-line ISSN 1870-9044

Resumo

NGONGA NGOMO, Axel-Cyrille  e  SCHUMACHER, Frank. Disentangling the Wikipedia Category Graph for Corpus Extraction. Polibits [online]. 2009, n.39, pp.5-10. ISSN 1870-9044.

In several areas of research such as knowledge management and natural language processing, domain-specific corpora are required for tasks such as terminology extraction and ontology learning. The presented investigations herein are based on the assumption that Wikipedia can be used for the purpose of corpus extraction. It presents the advantage of possessing a semantic layer, which should ease the extraction of domain-specific corpora. Yet, as the Wikipedia category graph is scale-free, it can not be used as it is for these purposes. In this paper, we propose a novel approach to graph clustering called BorderFlow, which we use and evaluate on the Wikipedia category graph. Additional possible applications of these results in the area of information retrieval are presented.

Palavras-chave : Natural language processing; local graph clustering; corpus extraction.

        · texto em Inglês

 

Creative Commons License Todo o conteúdo deste periódico, exceto onde está identificado, está licenciado sob uma Licença Creative Commons