Evaluation of the Quality of Conversational Agents for the Creation of Evaluation Instruments in Bioelectric Signals Measurement

Pérez-Sanpablo, Alberto Isaac; Rodriguez-Urrea, Marcela D.; Arquer-Ruíz, María del Carmen; Ramirez-Morales, Adrian Octavio; Meneses-Peñaloza, Alicia

doi:10.17488/rmib.44.4.11

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Permalink

Revista mexicana de ingeniería biomédica

versión On-line ISSN 2395-9126versión impresa ISSN 0188-9532

Resumen

PEREZ-SANPABLO, Alberto Isaac et al. Evaluation of the Quality of Conversational Agents for the Creation of Evaluation Instruments in Bioelectric Signals Measurement. Rev. mex. ing. bioméd [online]. 2023, vol.44, n.spe1, pp.152-164. Epub 21-Jun-2024. ISSN 2395-9126. https://doi.org/10.17488/rmib.44.4.11.

This research aims to evaluate the quality of conversational agents based on Large Language Models for evaluating the application of knowledge in Biomedical Engineering. An evaluation instrument was developed on six topics for measuring bioelectrical signals prepared by a human agent and the conversational agents Chat-GPT and Bard. The quality of the instrument was evaluated in terms of level of thinking, validity, relevance, clarity, difficulty, and discrimination capacity, using the kappa (k) index of the agreement of two experts and Rasch analysis of results from thirty-eight students. After eliminating seven questions from the conversational agents due to validity and originality problems, a 6-question instrument was integrated. The questions were valid and relevant, clear (>0.95, k=1.0), with low to high difficulty (0.61-0.87, k=0.83), adequate discrimination index (0.11-0.47), at the analysis level of thinking (k =0.22). The average score of the students was 7.24±2.40. This is the first critical analysis of the quality of conversational agents at a level of thinking higher than comprehension. The conversational agents presented limitations in terms of validity, originality, difficulty, and discrimination compared to the human expert, which highlights the need for their supervision.

Palabras llave : artificial intelligence; Bard; biomedical engineering; Chat-GPT; educational measurement.

· resumen en Español · texto en Español · Español (

pdf )