How to preprocess an ordered categorical variable to feed a machine learning algorithm?

Question

How to preprocess an ordered categorical variable to feed a machine learning algorithm?

marcus

2022年6月4日 22:00

I have a categorical variable that measures the income of a family:

A: no income
B: Up to $500
C: $500-$700
…
P: $5000-$6000
Q: More than \\\$6000

It seems odd to me that I have to get dummies for this variable, since it's ordered. I wonder if it's better to map the values: {'A': 0, 'B': 1, …, 'Q': 17} so I can input it into the algorithm this values as integer numbers.

What's the proper way of preprocessing this variable to feed an algorithm such as Random Forest or a simple neural network?

Topic data-wrangling preprocessing dataset machine-learning

Category Data Science

Carlos Mougan · Accepted Answer · 2020年8月21日 06:31

One way to do is to use target encoding:

(There are a million resources to learn target encoding)

This way your categories will not only be ordered by the number but for the target value (what is at the end what you want, to give better predictions)

How to preprocess an ordered categorical variable to feed a machine learning algorithm?

About