How to deal with address (like zip-code) for training a model?

Question

How to deal with address (like zip-code) for training a model?

aRedDish

2022年5月4日 20:11

To me it doesn't make sense to normalize it even if it is a numerical variable like Zip Code. An address should be interpreted as categorical features like neighborhood... ?

Suppose I have geolocalisation data (latitude longitude), the best thing to do seem to use k-means clustering and then working with cluster's label that I encode.

If the answer is : it depends please tell me how

Topic categorical-encoding geospatial machine-learning

Category Data Science

aRedDish · Accepted Answer · 2022年5月4日 20:11

In the book "Machine Learning Engineering" by Andriy Burkov (chapter 4.12.4), it is recommended to consider "Zip Codes" as categorical like "country" would be. The goal being reducing cardinality (i.e the number of unique values) of such variables in order to avoid "several modes" depending on that feature.

How to deal with address (like zip-code) for training a model?

About