How may I find the formula that produces greatest distance between data points using machine learning?

Question

How may I find the formula that produces greatest distance between data points using machine learning?

JS4137

2021年10月25日 13:39

I currently have a data set shown below:

0.35535   0.32226      0.35594    0.38433  0.32773     0.34685  0.35475  0.37606
0.42278   0.34502      0.45573    0.54538  0.35488     0.40833  0.43780  0.48279
0.34622   0.32314      0.36684    0.41292  0.32893     0.35636  0.36386  0.38715
0.35892   0.33035      0.41856    0.47302  0.33769     0.37625  0.38597  0.42510
0.32681   0.31423      0.35694    0.38962  0.32438     0.34359  0.34893  0.36110
0.31092   0.30892      0.32405    0.33759  0.31260     0.31992  0.32202  0.33002

I am trying to use machine learning to find a formula $f(n) = k$ where $n$ would be each of these data points and $k$ is an arbitrary value, such that each value of $k$ is the greatest possible distance to any other value of $k$. That is, if $f(n_1) = k_1$ and $f(n_2) = k_2$, I want $k_1$ and $k_2$ to be as far apart from each other as possible.

I have written some pseudocode which is my best understanding of how to do so:

# Pseudocode for the machine learning

from sympy import *

data_output = {}

def generate_formula(dataset):
    # Generates new formulas based on existing formulas
    model = ml.train(dataset)
    generated_formulas = []
    for new_formula in predict(model, formula)
        generated_formulas.append(new_formula)
    return generated_formulas

def get_distribution(eq, data):
    rank_distribution = {}
    # Create a data array from an equation
    data_array = []
    for country_data in data:
        for data_point in country_data
            res = eq.subs(country_data)
            data_array.append(res)
    # Find the median value of the distance between each 2 data 
    # points given a formula
    for i in data_array:
        distance = (data_array[i + 1] - i)
        rank_distribution[data_point] = distance
    return rank_distribution

def rank_distribution():
    # Score the formula based on its current distribution

# Dataset will be the list of known equations
dataset = [
    (x + c) / x,
    5 * x + 8,
    2 * x / x + 1
]

for eq in generated_formulas(dataset):
    equation = latex(eq)
    equation_score = rank_distribution(eq)
    data_output[equation] = equation_score

print(equation_score)

My issue is how I would process the equations in the first place, because I cannot handle equations as data points processed by most machine learning algorithms. As far as I have found the closest question to my issue on stack exchange is this one but it doesn't solve my issue. Suggestions are highly appreciated.

Topic generalization neural-network machine-learning

Category Data Science

How may I find the formula that produces greatest distance between data points using machine learning?

About