SciELO - Scientific Electronic Library Online

 
vol.26 número4Cardiovascular Disease Detection Using Machine LearningAgent-Based Modeling for Evaluation of Transportation Mode Selection in the State of Guanajuato, Mexico índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Resumen

LASKAR, Sahinur Rahman; MANNA, Riyanka; PAKRAY, Partha  y  BANDYOPADHYAY, Sivaji. A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation. Comp. y Sist. [online]. 2022, vol.26, n.4, pp.1669-1687.  Epub 17-Mar-2023. ISSN 2007-9737.  https://doi.org/10.13053/cys-26-4-4423.

Machine translation deals with automatic translation from one natural language to another. Neural machine translation is a widely accepted technique of the corpus-based machine translation approach. However, an adequate amount of training data is required, and there is a need for the domain-wise parallel corpus to improve translational performance that shows translational coverages in various domains. In this work, a domain-specific parallel corpus is prepared that includes different domain coverages, namely, Agriculture, Government Office, Judiciary, Social Media, Tourism, COVID-19, Sports, and Literature domains for low-resource English-Assamese pair translation. Moreover, we have tackled data scarcity and word-order divergence problems via data augmentation and prior alignment concept. Also, we have contributed Assamese pretrained LM, Assamese word-embeddings by utilizing Assamese monolingual data, and a bilingual dictionary-based post-processing step to enhance transformer-based neural machine translation. We have achieved state-of-the-art results for both forward (English-to-Assamese) and backward (Assamese-to-English) directions of translation.

Palabras llave : English-Assamese; low-resource; neural machine translation; parallel corpus; data augmentation; prior alignment; language model.

        · texto en Inglés     · Inglés ( pdf )