Dummy variables for unseen data in R
I got the following problem: When I trained my model I created my dummy variables(before train-test split) in the following way:
dummy - dummyVars(formula = CLASS_INV ~ ., data = campaign_spending_final_imputed, fullRank = TRUE)
dummy %% saveRDS('model/dummy.rds') #I save it to use it later
campaign_spending_final_dummy - predict(dummy, newdata = campaign_spending_final_imputed) %% as.data.frame() %%
mutate(CLASS_INV = campaign_spending_final$CLASS_INV)
The model was trained and tested successfully. Now I want to test it on 'real world' data and I want to create dummy variables from a single new record. What I tried to do was :
dummy_inv - readRDS('model/dummy_inv.rds') #The file I saved above
predict(dummy_inv,single_record)
The single record has the same features of the training and test set, it is just a single row.
However when I execute the predict
function the following error comes out:
Error in `contrasts-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Am I proceeding in the correct way? Do I need to create a new dummyVars
object? Shouldn't I use the one 'adapted' to my trainin data?
Thank you
Topic dummy-variables preprocessing r categorical-data machine-learning
Category Data Science