SciELO - Scientific Electronic Library Online

 
vol.27 número1Power Spectral Analysis of Bioacoustic Signals Emitted by a Bottlenose Dolphin when Performing Assisted TherapyMusic Recommender System based on Sentiment Analysis Enhanced with Natural Language Processing Technics índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Resumen

HEREDIA-MARQUEZ, Arturo; GUZMAN-ARENAS, Adolfo  y  MARTINEZ-LUNA, Gilberto Lorenzo. Feature Selection Ordered by Correlation - FSOC. Comp. y Sist. [online]. 2023, vol.27, n.1, pp.33-51.  Epub 16-Jun-2023. ISSN 2007-9737.  https://doi.org/10.13053/cys-27-1-3982.

Data sets have increased in volume and features, yielding longer times for classification and training. When an object has many features, it often occurs that not all of them are highly correlated with the target class, and that significant correlation may exist between certain pair of features. An adequate removal of “useless” features saves time and effort at data collection, and assures faster learning and classification times, with little or no reduction in classification accuracy. This article presents a new filter type method, called FSOC (Feature Selection Ordered by Correlation), to select, with small computational cost, relevant features. FSOC achieves this reduction by selecting a subset of the original features. FSOC does not combine existing features to produce a new set of fewer features, since the artificially created features mask the relevance of the original features in class assignment, making the new model difficult to interpret. To test FSOC, a statistical analysis was performed on a collection of 36 data sets from several repositories some with millions of objects. The classification percentages (efficiency) of FSOC were similar to other feature selection features. Nevertheless, when obtaining the selected features, FSOC was up to 42 times faster than other algorithms such as Correlation Feature Selection (CFS), Fast Correlation-Based Filter (FCFB) and Efficient feature selection based on correlation measure (ECMBF).

Palabras llave : Feature selection; data mining; pre-processing; feature reduction; data analysis.

        · texto en Inglés     · Inglés ( pdf )