Separating numerical and categorical features in a binary classification problem

I have a dataset with employee data with around 9500 rows, and have to predict if the target is 0 or 1. Some of my features are the department of an employee, gender, salary, review_score(numerical), average_number_of_hours per month, bonus(1 or 0), number of projects an employee is involved in, and tenure.

I have a question if number of projects (3,4,5,6) and tenure(2,3,4,5,6,7,8,9,10,11,12) should be treated as 'categories' rather than numerical values. I can make them ordinal.

However, I am not sure about treating tenure (the number of years an employee has been with the company) as 'category' because there are too many values.

I will be using linear/logistic algorithms to predict the target '1', and will also be attempting to find the best features.

Can somebody explain to me if 'tenure' and 'the number of projects' should be treated as numerical or categorical here and why? Is there a generally accepted limit on the maximum number in a category.

Topic binary-classification categorical-encoding numerical scikit-learn categorical-data

Category Data Science


I would like to treat both number of projects and tenure as numerical in personal.

In general, there are three data types: Numerical, Categorical and Ordinal. It doesn't have a district statistical definition of these data types, it's more like a rule of thumb to me.

But from the machine learning aspect, the tricky point here is that it doesn't matter how we interpret the columns. What matters is how we encode the columns, which affects the metrics/performance of the estimators.

Take tenure as example, if we apply one-hot encoding, that increases the number of features and every category is treated as a new column; if we apply label encoding, the input is [0, N], which changes the scale of the original column ([2,3,4,5] -> [0,1,2,3]).

We could test different encoding method, do some experiments and choose the best encoding method which gets the best score.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.