How can i use Hellinger Distance on array of different length?

I have to use Hellinger distance to compare arrays that are not the same length.

How do you do this correctly? Putting a zero in the missing fields for the shorter array does not sound like the best method to me.

Some more info on my data:

Most array dimensions are (1,58), but a some others are (1,28). Arrays contain numbers from 1 to 3.

Example:

Array1=[1 1 3 2 3]

Array2=[2 3 1 1]

One possible solution: newArray2=[2 3 1 1 0]

Is possible to use Hellinger Distance in this case? Is there any other distance function that could solve my problem?

I'm using Helliger in K-means because it's what the author of a paper I'm reading used. So, I would like to solve this issue using Hellinger.

Thanks.

Topic distance k-means clustering data-mining machine-learning

Category Data Science


What are you trying to do?

Don't blindly for together functions without thinking about the underlying math!

Hellinger distance is usually applied to histograms, and your before don't look like histograms. So something is wrong I'm your approach... Go back to the drawing board, not to the code.

P.S. k-means will also need vectors of the same length, and doesn't minimize Hellinger, I'd assume...

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.