SciELO - Scientific Electronic Library Online

 
vol.23 número3Extracting Context of Math Formulae Contained inside Scientific DocumentsIdentifying Short-term Interests from Mobile App Adoption Pattern índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

PALSHIKAR, Girish K.; PAWAR, Sachin; SRIVASTAVA, Rajiv  e  SHAH, Mahek. Identifying Repeated Sections within Documents. Comp. y Sist. [online]. 2019, vol.23, n.3, pp.819-828.  Epub 09-Ago-2021. ISSN 2007-9737.  https://doi.org/10.13053/cys-23-3-3263.

Identifying sections containing a logically coherent text about a particular aspect is important for fine-grained IR, question-answering and information extraction. We propose a novel problem of identifying repeated sections, such as project details in resumes and different sports events in the transcript of a news broadcast. We focus on resumes and present four techniques (2 unsupervised, 2 supervised) for automatically identifying repeated project sections. The knowledge-based method is modeled after the human way closely. The other methods are based on integer linear programming and sequence labeling. The proposed techniques are general and can be used for identifying other kinds of repeated sections (and even non-repeating sections) in different types of documents. We compared the four methods on a dataset of resumes of IT professionals and also evaluated the benefits of identifying such repeated sections in practical IR tasks. To the best of our knowledge, this paper is the first to propose and solve the problem of repeated sections identification.

Palavras-chave : Section identification; fine-grained IR; resume searching.

        · texto em Inglês     · Inglês ( pdf )