SciELO - Scientific Electronic Library Online

 
vol.24 número2Information Retrieval from Software Bug Ontology Exploiting Formal Concept Analysis índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Resumen

MONTELONGO GONZALEZ, Erick E.; REYES ORTIZ, José A.  y  GONZALEZ BELTRAN, Beatriz A.. Machine Learning Models for Cancer Type Classification with Unstructured Data. Comp. y Sist. [online]. 2020, vol.24, n.2, pp.403-411.  Epub 04-Oct-2021. ISSN 2007-9737.  https://doi.org/10.13053/cys-24-2-3367.

Machine learning (ML) techniques have been used to classify cancer types to support physicians in the diagnosis of a disease. Usually, these models are based on structured data obtained from clinical databases. However valuable information given as clinical notes included in patient records are not used frequently. In this paper, an approach to obtain information from clinical notes, based on Natural Language Processing techniques and Paragraph Vectors algorithm is presented. Moreover, Machine Learning models for classification of liver, breast and lung cancer patients are used. Also, a comparison and evaluation process of chosen ML models with varying parameters were conducted to obtain the best one. The ML algorithms chosen are Support Vector Machines (SVM) and Multi-Layer Perceptron (MLP). Results obtained are promising and they show the best model for classification is the MLP model with a precision 0.89 and f1-score 0.87, although the difference in precision between models is minimal (0.02).

Palabras llave : Machine learning; natural language processing; cancer classification; support vector machines; neural networks; unstructured data.

        · texto en Inglés     · Inglés ( pdf )