Mixed Data Type Classification / Neighbor Algorithm

Question

Mixed Data Type Classification / Neighbor Algorithm

CyberBully2003

2022年5月4日 18:48

Here is a hypothetical simplified dataframe of my problem, which would be low dimensional (20ish features), containing some made-up information about certain dog breeds:

Breed	Min_Weight	Max_Weight	Min_Height	Max_Height	is_friendly	grp
Husky	10	20	30	35	True	working
Poodle	8	17	15	30	False	terrier

The algorithm would receive some information about a dog, and it would need to identify k-closest dog breeds based on the input data. It needs to be high performance.

Example: algorithm receives an unknown breed with data:

Weight	Height	is_friendly	grp
18	23	1	terrier

Returns: n closest breeds from our sample dataframe, and the closeness

What sort of algorithm/model makes sense here, with multiple types of variables, ranges (min and max height, guessing I will need to generate data to fill in these ranges), and Boolean values?

Also, is there an approach to weight certain characteristics (ex: we are confident in the measurement of the unknown dogs weight so have that invoke more influence when choosing a breed, not confident about height, so lessen the influence, etc.)? How should I approach this problem?

Topic k-nn machine-learning-model classification algorithms clustering

Category Data Science

Mixed Data Type Classification / Neighbor Algorithm

About