Working Paper Article Version 2 This version is not peer-reviewed

k-Means+++: Outliers-Resistant Clustering

Version 1 : Received: 23 September 2020 / Approved: 24 September 2020 / Online: 24 September 2020 (03:22:16 CEST)
Version 2 : Received: 18 November 2020 / Approved: 19 November 2020 / Online: 19 November 2020 (10:54:12 CET)

How to cite: Statman, A.; Rozenberg, L.; Feldman, D. k-Means+++: Outliers-Resistant Clustering. Preprints 2020, 2020090558 Statman, A.; Rozenberg, L.; Feldman, D. k-Means+++: Outliers-Resistant Clustering. Preprints 2020, 2020090558

Abstract

The $k$-means problem is to compute a set of $k$ centers (points) that minimizes the sum of squared distances to a given set of $n$ points in a metric space. Arguably, the most common algorithm to solve it is $k$-means++ which is easy to implement, and provides a provably small approximation factor in time that is linear in $n$. We generalize $k$-means++ to support: (i) non-metric spaces and any pseudo-distance function. In particular, it supports M-estimators functions that handle outliers, e.g. where the distance $\mathrm{dist}(p,x)$ between a pair of points is replaced by $\min {\mathrm{dist}(p,x),1}$. (ii) $k$-means clustering with $m\geq 1$ outliers, i.e., where the $m$ farthest points from the $k$ centers are excluded from the total sum of distances. This is the first algorithm whose running time is linear in $n$ and polynomial in $k$ and $m$.

Subject Areas

Clustering; Approximation; Outliers

Comments (1)

Comment 1
Received: 19 November 2020
Commenter: Adiel Statman
Commenter's Conflict of Interests: Author
Comment: Many conceptual addtions (in sections 3 and 5) and also a methematical error.
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.