An Efficient Heuristic Applied to the K-means Algorithm for the Clustering of Large Highly-Grouped Instances

Pérez-Ortega, Joaquín; Hidalgo-Reyes, Miguel; Castro-Sánchez, Noé Alejandro; Pazos-Rangel, Rodolfo; Díaz-Parra, Ocotlán; Olivares-Peregrino, Víctor; Almanza-Ortega, Nelva

doi:10.13053/cys-22-2-2546

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Similares em SciELO

Permalink

Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

PEREZ-ORTEGA, Joaquín et al. An Efficient Heuristic Applied to the K-means Algorithm for the Clustering of Large Highly-Grouped Instances. Comp. y Sist. [online]. 2018, vol.22, n.2, pp.607-619. Epub 21-Jan-2021. ISSN 2007-9737. https://doi.org/10.13053/cys-22-2-2546.

With the increasing presence of Big Data there arises the need to group large instances. These instances present a number of objects with multidimensional features, which require to be grouped in hundreds or thousands of clusters. This article presents a new improvement to the K-means algorithm, which is oriented to the efficient solution of instances with a large number of clusters and dimensions. This heuristic is called Honeycomb (HC) and it is based on the relationship between the number of dimensions and the number of centroids that form a neighborhood. That is, the heuristic allows the reduction of the number of distance calculations for each object. The heuristic was validated by solving a set of synthetic instances obtaining reductions in execution time of up to 90 % and a quality reduction of less than 1 %, with respect to standard K-means. For real instances of low and high dimensionality, HC obtained a reduction of execution time between 84.74 % and 95.44 % with a quality reduction between 1.07 % and 1.62 %, respectively. The experimental results are encouraging because this heuristic would benefit those domains that require instances with a continuous increase of the number of objects, dimensions and clusters.

Palavras-chave : K-means algorithm; computational complexity; heuristic.

· resumo em Espanhol · texto em Espanhol · Espanhol (

pdf )