How may I find the formula that produces greatest distance between data points using machine learning?
I currently have a data set shown below:
0.35535 0.32226 0.35594 0.38433 0.32773 0.34685 0.35475 0.37606
0.42278 0.34502 0.45573 0.54538 0.35488 0.40833 0.43780 0.48279
0.34622 0.32314 0.36684 0.41292 0.32893 0.35636 0.36386 0.38715
0.35892 0.33035 0.41856 0.47302 0.33769 0.37625 0.38597 0.42510
0.32681 0.31423 0.35694 0.38962 0.32438 0.34359 0.34893 0.36110
0.31092 0.30892 0.32405 0.33759 0.31260 0.31992 0.32202 0.33002
I am trying to use machine learning to find a formula $f(n) = k$ where $n$ would be each of these data points and $k$ is an arbitrary value, such that each value of $k$ is the greatest possible distance to any other value of $k$. That is, if $f(n_1) = k_1$ and $f(n_2) = k_2$, I want $k_1$ and $k_2$ to be as far apart from each other as possible.
I have written some pseudocode which is my best understanding of how to do so:
# Pseudocode for the machine learning
from sympy import *
data_output = {}
def generate_formula(dataset):
# Generates new formulas based on existing formulas
model = ml.train(dataset)
generated_formulas = []
for new_formula in predict(model, formula)
generated_formulas.append(new_formula)
return generated_formulas
def get_distribution(eq, data):
rank_distribution = {}
# Create a data array from an equation
data_array = []
for country_data in data:
for data_point in country_data
res = eq.subs(country_data)
data_array.append(res)
# Find the median value of the distance between each 2 data
# points given a formula
for i in data_array:
distance = (data_array[i + 1] - i)
rank_distribution[data_point] = distance
return rank_distribution
def rank_distribution():
# Score the formula based on its current distribution
# Dataset will be the list of known equations
dataset = [
(x + c) / x,
5 * x + 8,
2 * x / x + 1
]
for eq in generated_formulas(dataset):
equation = latex(eq)
equation_score = rank_distribution(eq)
data_output[equation] = equation_score
print(equation_score)
My issue is how I would process the equations in the first place, because I cannot handle equations as data points processed by most machine learning algorithms. As far as I have found the closest question to my issue on stack exchange is this one but it doesn't solve my issue. Suggestions are highly appreciated.
Topic generalization neural-network machine-learning
Category Data Science