Ordering scrambled 1D data sets by continuity

This is a cute little clustering problem that was probably solved a million times over, but I couldn't find a good reference for it.

I have 20 1D datasets with 400 entries each. In the picture they are denoted by different colors.

As you can see, they are also pretty continuous. However, for each index i the datasets have been re-ordered by magnitude, i.e. instead of nice continuous lines, the color now jumps at every intersection of each two datasets.

Is there a way to bring the the datasets to their original order? I.e. cluster the data into 20 continuous lines? This could be easily done by eye.

Thank you very much!

Ofri

P.S. Here's what I tried so far. For each index i I assumed the datasets are already ordered up to i-1. I now prepare 20 bins by extrapolating the ordered datasets from the first i-1 indices to the i index. Now I have 20 values to put in 20 bins, with exactly one value in each bin. I can try 20! combinations and find the one with the minimal error but there must be a more clever/efficient way.

Topic interpolation dataset clustering

Category Data Science


I ended up soling this by cutting the datasets into segments and then gluing them together. Here are some of the disentangled datasets:

enter image description here

First I cut all the datasetes into continuous segments by identifying the discontinuities (I looked at peaks of the second derivative). I then sowed the segments together based on which segment is the closest to the 2nd order interpolation from the neighboring segment.

There were some special cases that I had to deal with - for example actual divergences in the signal. I dealt with these by identifying segments that cannot be continuously sowed into any other segments (these are the ones going to +-infinity) and gluing them to other segment returning from +- infinity.

Ofri


Can't you sort by distance to the previous point?

Manually identify the first point of a segment. Then always find the closest point to the previous one, add it to the result and remove it from the candidates. When out of candidates, add the next file.

No clustering necessary.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.