Dummy variables for unseen data in R

I got the following problem: When I trained my model I created my dummy variables(before train-test split) in the following way:

dummy - dummyVars(formula = CLASS_INV ~ ., data = campaign_spending_final_imputed, fullRank = TRUE)
dummy %% saveRDS('model/dummy.rds') #I save it to use it later
campaign_spending_final_dummy - predict(dummy, newdata = campaign_spending_final_imputed) %% as.data.frame() %%
  mutate(CLASS_INV = campaign_spending_final$CLASS_INV)

The model was trained and tested successfully. Now I want to test it on 'real world' data and I want to create dummy variables from a single new record. What I tried to do was :

dummy_inv - readRDS('model/dummy_inv.rds') #The file I saved above
predict(dummy_inv,single_record)

The single record has the same features of the training and test set, it is just a single row.
However when I execute the predict function the following error comes out:

Error in `contrasts-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Am I proceeding in the correct way? Do I need to create a new dummyVars object? Shouldn't I use the one 'adapted' to my trainin data?
Thank you

Topic dummy-variables preprocessing r categorical-data machine-learning

Category Data Science


I'm a bit late but since I had the same problem, here is how I solved it.

It's not the most elegant way but I just created new columns for all the dummy variables created with the dummyVars() function, and then assigned the values manually.

So if "df" is our new data frame and x_train is the one we created with the dummyVars() function, I used:

    existing_cols <- names(x_train)[names(x_train) %in% names(df)]
    new_cols <- names(x_train)[!names(x_train) %in% names(df)]
    df[new_cols] <- 0
    nomvars <- c("cp", "ca", "thal", "restecg", "slope")

    for (i in 1:nrow(df)){
      for(j in 1:length(nomvars)){
        df[i,paste0(nomvars[j],df[nomvars[j]][i])] <- 1 
      }
    }

    df <- df[,names(df) %in% c(existing_cols, new_cols)]

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.