If I understood them correctly, both Jeremy and Edmund's (first) solutions are the same, namely, plain euclidean distance in a 4-dimensional space of IP addresses.BTW, I think a very fast alternative to euclidean distance would be to calculate a hamming distance bit-wise.
Edmund's first update would be better than his second. The reason is simple to state: his 2nd update tries to define a distance measure by considering a non-linear function of the coordinates of a 4D vector. That however will most likely destroy the key properties that it needs to satisfy in order to be a metric, namely
- Injectivity: $d(IP_1,IP_2)=0 \iff IP_1=IP_2$,
- Symmetry: $d(IP_1,IP_2)=d(IP_2,IP_1)$, and
- Triangular inequality: $d(IP_1,IP_2)\leq d(IP_1,IP_3)+d(IP_3,IP_2)\,\forall IP_3$.
The latter is key for later interpreting small distances as close points in IP space. One would need a linear (in the coordinates) distance function. However, simple euclidean distance is not enough as you saw.
Physics (well, differential geometry actually) could lead to a nice solution to this problem: define a metric tensor $g$. In plain english, give weights to each pair of coordinates, take each pair difference, square it and multiply it by its weight, and then add those products. Take the square root of that sum and define it as your distance.
For the sake of simplicity, one could start trying with a diagonal metric tensor.
Example: Say you take $g=\begin{pmatrix}1000 &0 &0 &0 \\0 &100&0&0\\0&0&10&0\\0&0&0&1\end{pmatrix}$ $IP_1=(x_1,x_2,x_3,x_4)$ and
$IP_2=(y_1,y_2,y_3,y_4)$. Then the square of the distance is given by
$$d(IP_1,IP_2)^2=1000*(x_1-y_1)^2+100*(x_2-y_2)^2+\\ \,+10*(x_3-y_3)^2+1*(x_4-y_4)^2$$
For $IP_1=192.168.1.1,\,IP_2=192.168.1.2$ the distance is clearly 1.
However, for $192.168.1.1$ and $191.168.1.1$ the distance is
$\sqrt{1000}\approx 32$
Eventually you could play around with different weights and set a kind of normalization where you could fix the value of the maximal distance $d(0.0.0.0,FF.FF.FF.FF)$.
Furthermore, this set up allows for more complex descriptions of your data where the relevant distance would contain "cross-products" of coordinates like say $g_{13}*(x_1-y_1)*(x_3-y_3)$.
EDIT: While this would be a better "weighting" method than using those other I addressed, I realize now it is actualy meaningless: As Anony-Mousse and Phillip mention, IP are indeed 32 dimensional. This means in particular that giving the same weight to all bits in say the 2nd group is in general not sound: One bit could be part of the netmask while the other not. See Anony-Mousse answer for additional objections.