Clustering 2D curves

I have a set of curves in 2D space each expressed as a set of (sampled) data points. Each set has more or less the same number of items - eventually I guess I’ll use binning to make sure the number of points is the same (say 50) if that can help.

I would like to cluster the curves in N groups. Computing N should be part of the solution.

Possible translations on the first dimension are irrelevant.

I have a k-means implementation available.

I was thinking to transform the problem into a 100-dimensional space (50x2) where the samples of each curve become features.

Could this approach work? Is there a better one, either using k-means or a different algorithm?

Topic unsupervised-learning clustering machine-learning

Category Data Science


Yes, you could start with k-means on the 100-dim data. Make sure you have normalized your data to perhaps zero mean and unit variance before running k-means.

Use the elbow method (google it) to determine the optimal number of clusters for your data.

By the way you also want to check if the two dimensions are correlated in any way. If it is then you can use PCA to convert the 2 dimensions into a single dimension that explains most of the variance in your data and then cluster using 50-dim per observation.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.