Clasificación kNN de documentos usando GPU

Bresler Camps, Rubén; Gil García, Reynaldo

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.15 n.1 Ciudad de México Jul./Sep. 2011

Artículos

Clasificación kNN de documentos usando GPU

Document kNN Clasification using GPU

Rubén Bresler Camps¹ y Reynaldo Gil García²

¹ Empresa de Desarrollo de Aplicaciones, Tecnologías y Sistemas, Santiago de Cuba, Cuba. E–mail: ruben.bressler@cerpamid.co.cu

² Centro de Reconocimiento de Patrones y Minería de Datos, Santiago de Cuba, Cuba. E–mail: gil@cerpamid.o.cu

Artículo recibido el 12 de febrero de 2011.
Aceptado el 30 junio de 2011.

Resumen

La búsqueda de los k vecinos más cercanos, ha sido aplicada a una amplia variedad de aplicaciones en el campo de la Minería de Textos y la Recuperación de Información por su simplicidad y precisión. Sin embargo, estas áreas del conocimiento en general manipulan objetos con altas dimensiones de rasgos que hacen que el proceso de encontrar los k objetos más similares a uno dado tenga una intensidad computacional elevada, debido a la gran cantidad de operaciones que se realizan para calcular la semejanza entre todos los objetos implicados. En este trabajo se proponen dos métodos de multiplicación paralela de matrices dispersas usando una GPU, que minimizan el tiempo empleado en el cálculo de semejanzas entre objetos del algoritmo kNN para clasificar documentos.

Palabras clave: GPGPU, clasificación de documentos y multiplicación de matrices dispersas.

Abstract

The search for the k nearest neighbors, has been applied to a wide variety of applications in the field of Text Mining and Information Retrieval for its simplicity and accuracy. However, these general areas of knowledge in handling high–dimensional objects with features that make the process of finding the k most similar objects to a given computer has a high intensity, due to the large number of operations performed to calculate the similarity between all the objects involved. In this paper we propose two methods for parallel sparse matrix multiplication using a GPU, which minimize the time spent in the calculation of similarities between objects in the kNN algorithm to classify documents.

Keywords: GPGPU, document classification and sparse matrix multiplication.

DESCARGAR ARTÍCULO EN FORMATO PDF

Referencias

1. Barrientos, R. J., Gómez, J. I., Tenllado, C. & Prieto M. (2010). Heap Based k–Nearest Neighbor Search on GPUs. XXI Jornadas de Paralelismo, Valencia, España, 559–566. [ Links ]

2. Baskaran, M.M. & Bordawekar, R. (2009). Optimizing Sparse Matrix–Vector Multiplication on GPUs (IBM Technical Report RC24704). USA: IBM Research Division. [ Links ]

3. Bell, N. & Garland, M. (2008). Efficient Sparse Matrix–Vector Multiplication on CUDA (NVIDIA Technical ReportNVR–2008–004). USA: NVIDIA Corporation. [ Links ]

4. Feldman, R. & Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge; New York: Cambridge University Press. [ Links ]

5. Frakes, W. B. & Baeza–Yates, R. (1992). Information Retrieval, Data Structure and Algorithms. Englewood Cliffs, N.J.: Prentice Hall. [ Links ]

6. Garcia, V., Debreuve, E., Nielsen, F. & Barlaud, M. (2010). K–nearest neighbor search: Fast GPU–based implementations and application to high–dimensional feature matching. 17^th IEEE International Conference on Image Processing. Hong Kong, China, 3757–3760. [ Links ]

7. Kuang, Q. & Zhao, L. (2009). A Practical GPU Based KNN Algorithm. Second Symposium International Computer Science and Computational Technology, Huangshan, China, 151–155. [ Links ]

8. Lewis, D. D., Yang, Y., Rose, T. G. & Li, F. (2004). RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 5(2004), 361–397. [ Links ]

9. Moreno–Seco, F., Micó, L. & Oncina, J. (2003). Approximate Nearest Neighbour Search with the Fukunaga and Narendra Algorithm and Its Application to Chromosome Classification. Progress in Pattern Recognition, Speech and Image Analysis. Lecture Notes in Computer Science, 2905, 322–328. [ Links ]

10. NVIDIA CUDA^TM 2.3 Programming Guide, Version 2.3.1, 2009 [ Links ]

11. Hernández–Rodríguez, S., Carrasco–Ochoa, J. A & Martínez–Trinidad, J. F. (2007). Fast k Most Similar Neighbor Classifier for Mixed Data Based on a Tree Structure and Approximating–Eliminating. Progress in Pattern Recognition, Image Analysis and Applications. Lecture Notes in Computer Science, 5197, 364–371. [ Links ]

12. Wang, Z., Xu, X., Zhao, W., Zhang, Y. & He, S. (2010). Optimizing sparse matrix–vector multiplication on CUDA. 2^ndInternational Conference on Education Technology and Computer (ICETC), 109–113. [ Links ]