SciELO - Scientific Electronic Library Online

 
vol.26 número3Constructing Vietnamese WordNet: A Case StudyTowards an Automatic Mark-up of Rhetorical Structure in Student Essays índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Resumen

NHAT MINH, Pham Quang. A Feature-Rich Vietnamese Named Entity Recognition Model. Comp. y Sist. [online]. 2022, vol.26, n.3, pp.1323-1331.  Epub 02-Dic-2022. ISSN 2007-9737.  https://doi.org/10.13053/cys-26-3-4353.

In this paper, we present a feature-based named entity recognition (NER) model that achieves the start-of-the-art accuracy for Vietnamese language. We combine word, word-shape features, PoS, chunk, Brown-cluster-based features, and word-embedding-based features in the Conditional Random Fields (CRF) model. We also explore the effects of word segmentation, PoS tagging, and chunking results of many popular Vietnamese NLP toolkits on the accuracy of the proposed feature-based NER model. Up to now, our work is the first work that systematically performs an extrinsic evaluation of basic Vietnamese NLP toolkits on the downstream NER task. Experimental results show that while automatically-generated word segmentation is useful, PoS and chunking information generated by Vietnamese NLP tools does not show their benefits for the proposed feature-based NER model.

Palabras llave : Feature selection; Vietnamese; named entity recognition.

        · texto en Inglés