How to handle both the categorical and ordinal features in a single data sets?
I was practicing Lasso regression with the SPARCS hospital dataset. There are two kinds of features in the dataset:
Categorical features like location of the hospital, demographics of patients, etc.
Ordinal features like the length of stay, the severity of disease, rate of mortality, etc.
When processing the dataset I created new features by one-hot encoding the categorical features in, let us say, X_cardi
DataFrame and by generating polynomial features for the ordinal features in X_ordi
DataFrame.
X_combined = pd.concat([X_ordi, X_cardi], axis=1, ignore_index=True)
from sklearn.linear_model import LassoLars as lassoReg
lasso = lassoReg(best_lambda)
weights = lasso.fit(X_combined, Y).coef_
Now when I concatenate these DataFrames and train using LassoLars to select the viable features, as you can see from the code snippet above, I get the weights array full of 0.0. Now I think this had to do with most of the columns in X_combined
full of 0.0 except a few 1.0 due to one-hot encoding. And the polynomial features have to suffer from this. Is this the case and how to handle this?
Topic lasso scikit-learn feature-selection categorical-data machine-learning
Category Data Science