What are the best practises to decide whether a variable is categorical?
What are some of the systematic ways to categorise variables into categorical or numeric? I believe using only intuition in such scenarios can many-a-times lead to major irreversible errors. What are the best strategies when categorising variables?
For example, the dataframe I'm working has several categorical variables such as is_holiday
that has labels for several holidays. However certain variables like visibility_in_miles
suggest that those too need to be treated as categorical. part of the reason is that while most variables have hundreds of unique values, some have only 9 points.
Topic structured-data data-cleaning categorical-data
Category Data Science