Automatic detection of ML problem type: Regression or Classification
I am trying to design an algorithm that based on training data automatically detects ML problem type: Regression or Classification.
There is no need to say that it is impossible to design such an algorithm that will be correct in 100% of cases. The goal is to find a heuristic that will be wrong in 10% or less.
The first obvious, naive idea would be assigning regression model to the data that has at least 80% of unique values. Yet for small data sets that may be wrong. One example is a data set with 125 records labeled with 100 classes, that naive approach will determine as a regression problem, when in fact this is a multi-labeled classification.
Any ideas, links to the existing work in this area? Thanks!
Topic automl regression classification
Category Data Science