Abstract
We consider a generalization of the fundamental k-means clustering for data with incomplete or corrupted entries. When data objects are represented by points in ℝ^d, a data point is said to be incomplete when some of its entries are missing or unspecified. An incomplete data point with at most Δ unspecified entries corresponds to an axis-parallel affine subspace of dimension at most Δ, called a Δ-point. Thus we seek a partition of n input Δ-points into k clusters minimizing the k-means objective. For Δ = 0, when all coordinates of each point are specified, this is the usual k-means clustering. We give an algorithm that finds an (1 + ∊)-approximate solution in time f(k, ∊, Δ) · n2 · d for some function f of k, ∊, and Δ only.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) |
Publisher | SIAM |
Pages | 2649-2659 |
Number of pages | 11 |
ISBN (Electronic) | 978-1-61197-646-5 |
DOIs | |
Publication status | Published - 7 Jan 2021 |