Approximation algorithms for Hamming clustering problems

Leszek Ga̧sieniec, Jesper Jansson, Andrzej Lingas

Research output: Contribution to journalConference articlepeer-review

23 Scopus citations

Abstract

We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S. We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k = O(logn) and p = O(1), or when p = 2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P = NP. We also prove that for any ε > 0 it is NP-hard to split S into at most pk1/7-ε clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p = 3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293 306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pρ/ε)k O(p/ε)n2-time (1 + ε)-approximation algorithm for HRC. In particular, it runs in polynomial time when p = O(1) and ρ = O(log(k + n)). Finally, we show how to find in O((n/ε + kn log n + k 2 log n) (2ρk)2/ε) time a set L of O(p log k) strings of length n such that for each string in S there is at least one string in L within distance (1 + ε)ρ, for any constant 0 < ε < 1.

Original languageEnglish (US)
Pages (from-to)289-301
Number of pages13
JournalJournal of Discrete Algorithms
Volume2
Issue number2 SPEC. ISS.
DOIs
StatePublished - Jun 2004
Externally publishedYes
EventCombinatiorial Pattern Matching - Montreal, Canada
Duration: Jun 21 2000Jun 23 2000

Keywords

  • Approximation algorithms
  • Hamming distance
  • Integer programming
  • NP-hardness
  • P-clustering problem

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Discrete Mathematics and Combinatorics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Approximation algorithms for Hamming clustering problems'. Together they form a unique fingerprint.

Cite this