Triplet optimization producing a weird diagonal line?

Question

Triplet optimization producing a weird diagonal line?

10GeV

2020年8月26日 20:13

I'm pretty sure this is the right forum for this, or let me know otherwise, I'll happily move this to a better place.

I have a strange problem. I've written an algorithm designed to take three files of UNIX timestamps, and produce a list of triplets in order of closeness. Each triplet is unique (no two triplets share an element), each triplet has one element from each file, and each triplet {x,y,z} is created so as to minimize max(x,y,z) - min(x,y,z).

When I run the algorithm and did a visual examination of the output, everything looked great. But, when I plotted the data, something weird happened. I plotted the resulting triplets on a 2-dimensional histogram. The horizontal axis for a triplet {x,y,z} was x-y, and the horizontal axis was x-z. I ended up with a weird, perfect diagonal line running from the bottom left hand corner to the upper right hand corner: https://filebin.net/3d0qfqi7ice1uik0/lines1.pdf?t=derlloqu .

I thought something might've been wrong with my algorithm, so I tried two additional algorithms. First, I wrote a binary search algorithm. Instead of finding a given element, the binary search terminated by comparing the closest found element to it's neighbors. I then saved the closest element in a list and, when finished, sorted the list in order of closeness and removed non-unique triplets. I ended up with the same, odd diagonal line.

I tried another algorithm, the most common for this particular task. Since the three files are sorted, one can use that property to find the closest possible triplet from the three files. Simply pick a starting point, and move forward or back in each of the three triplets based upon which is the largest or smallest element (I probably did a terrible job of explaining that, a better explanation can be found at: ). Again, I ended up with the same, odd diagonal line.

I thought, perhaps it's something with the data. But, when I match one file of live data with two files of completely random data (uniformly distributed), I get the same, odd diagonal line. Moreover, when I match three files of completely random, uniformly distributed data, I get multiple odd diagonal lines: https://filebin.net/cqw31xysci7kaddn/lines2.pdf?t=jv9srf2t .

Any idea what's going on here? Is this some artifact of this particular kind of analysis? I'm pretty certain at this point it's not my algorithms. All three were completely different, yet all three produced the same result.

If you're interested in the source code: https://pastebin.com/8gL1D6Bw

Topic c++ historgram data visualization algorithms

Category Data Science

Triplet optimization producing a weird diagonal line?

About