Theoretical work on validity of restricting movement of Centroid of K-Mean

Question

Theoretical work on validity of restricting movement of Centroid of K-Mean

Joe89

2022年6月2日 10:03

I recently received a manuscript for review in which author used ~1000 "fake" data points, so that the final centroid of K-mean stays within the required range. Neither me nor the author seems to have background in data science and the paper is more of application into our research area.

I have tried to find published work related to such method of restricting k-mean centers, but failed to do so. However, on simple logic, it seems like valid way, so maybe author used wrong terminology.

Hence, I would like to ask, is this a valid way to restrict k-mean center and are there any published work on it?

Topic k-means

Category Data Science

Brian Spiering · Accepted Answer · 2020年4月25日 16:33

A generalized solution would be constrained optimization. Change to the loss function to only allow solutions within a certain region.

Adding fake data points to nudge the solution into a valid region has several limitations: it requires human intervention adjustment for every model run and no guarantees. Constrained optimization would be automated and provide give strong guarantees.

krayyem · Accepted Answer · 2018年10月29日 08:30

I highly recommend finding a source explains how k-means work and understand it well. The K-means is well known, so it is hard to find a reference talk about it as an algorithm or explain how it work.

I noticed you stating "author used ~1000 "fake" data points, so that the final centroid of K-mean stays within the required range" which is always going to be true. K-means is about calculating the mean (average) of data points used, which assure (always) to end with a centroid/s within the range of data used.

The power of this algorithm (K-means) is calculating the mean iteratively to reach stability of means (centroids). In another waord, in each iterate, means shift to be centered of denses. That give, if you in case of finding 1 K (one centroid) you will find it by one iterate.

Me personally suggest start with some videos, and go forward. Here is the first result on YouTube about k-means https://youtu.be/_aWzGGNrcic.

Theoretical work on validity of restricting movement of Centroid of K-Mean

About