SciELO - Scientific Electronic Library Online

 
vol.21 número4Subjectivity Detection in Nuclear Energy TweetsNamed Entity Recognition on Code-Mixed Cross-Script Social Media Content índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Resumen

RADHAKRISHNAN, Priya; JAWAHAR, Ganesh; GUPTA, Manish  y  VARMA, Vasudeva. SNEIT: Salient Named Entity Identification in Tweets. Comp. y Sist. [online]. 2017, vol.21, n.4, pp.665-679. ISSN 2007-9737.  https://doi.org/10.13053/cys-21-4-2864.

Social media is a rich source of information and opinion, with exponential data growth rate. However social media posts are difficult to analyze since they are brief, unstructured and noisy. Interestingly, many social media posts are about an entity or entities. Understanding which entity is central (Salient Entity) to a post, helps better analyze the post. In this paper we propose a model that aids in such analysis by identifying the Salient Entity in a social media post, tweets in particular. We present a supervised machine-learning model, to identify Salient Entity in a tweet and propose that the tweet is most likely about that particular entity. We have used the premise that, when an image accompanies a text, the text most likely is about the entity in that image, to build a dataset of tweets and salient entities. We trained our model using this dataset. Note that this does not restrict the applicability of our model in any way. We use tweets with images only to obtain objective ground truth data, while features for the model are derived from tweet text. Our experiments show that the model identifies Salient Named Entity with an F-measure of 0.63. We show the effectiveness of the proposed model for tweet-filtering and salience identification tasks. We have made the human annotated dataset and the source code of this model publicly available.

Palabras llave : Entity salience; named entity recognition; semantic search; named entity extraction.

        · texto en Inglés     · Inglés ( pdf )