What are the best practises to decide whether a variable is categorical?

What are some of the systematic ways to categorise variables into categorical or numeric? I believe using only intuition in such scenarios can many-a-times lead to major irreversible errors. What are the best strategies when categorising variables?

For example, the dataframe I'm working has several categorical variables such as is_holiday that has labels for several holidays. However certain variables like visibility_in_miles suggest that those too need to be treated as categorical. part of the reason is that while most variables have hundreds of unique values, some have only 9 points.

Topic structured-data data-cleaning categorical-data

Category Data Science


The number of categories within a variable does not matter whether a variable is categorical or not.

Categorical variables are mutually exclusive, unordered groups. For example, Christmas and Halloween are different holidays but have no order in the concept of "holiday-ness".

Ordinal variables which are mutually exclusive, ordered groups with no consistent measure of the distance between the ordering. For example, ranking items (e.g., first, second, third, …). There could be big (or small) differences between each of the places.

Numeric variables have consistent differences between values. The difference between 10 and 11 miles is the same as the difference 20 and 21 miles because "mile-ness" is a consistent measure.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.