Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Polibits
versión On-line ISSN 1870-9044
Resumen
DăNăILă, Iulia; DINU, Liviu P.; NICULAE, Vlad y SULEA, Octavia-Maria. String Distances for Near-duplicate Detection. Polibits [online]. 2012, n.45, pp.21-25. ISSN 1870-9044.
Near-duplicate detection is important when dealing with large, noisy databases in data mining tasks. In this paper, we present the results of applying the Rank distance and the Smith-Waterman distance, along with more popular string similarity measures such as the Levenshtein distance, together with a disjoint set data structure, for the problem of near-duplicate detection.
Palabras llave : Near-duplicate detection; string similarity measures; database; data mining.