How to handle both the categorical and ordinal features in a single data sets?

Question

How to handle both the categorical and ordinal features in a single data sets?

Manas Satti

2021年9月11日 19:02

I was practicing Lasso regression with the SPARCS hospital dataset. There are two kinds of features in the dataset:

Categorical features like location of the hospital, demographics of patients, etc. Ordinal features like the length of stay, the severity of disease, rate of mortality, etc. When processing the dataset I created new features by one-hot encoding the categorical features in, let us say, X_cardi DataFrame and by generating polynomial features for the ordinal features in X_ordi DataFrame.

X_combined = pd.concat([X_ordi, X_cardi], axis=1, ignore_index=True)
from sklearn.linear_model import LassoLars as lassoReg
lasso = lassoReg(best_lambda)
weights = lasso.fit(X_combined, Y).coef_

Now when I concatenate these DataFrames and train using LassoLars to select the viable features, as you can see from the code snippet above, I get the weights array full of 0.0. Now I think this had to do with most of the columns in X_combined full of 0.0 except a few 1.0 due to one-hot encoding. And the polynomial features have to suffer from this. Is this the case and how to handle this?

Topic lasso scikit-learn feature-selection categorical-data machine-learning

Category Data Science

How to handle both the categorical and ordinal features in a single data sets?

About