How to use prediction model after onehot encoding?

I have created a prediction model for this dataset

df.head()

    Service    Tasks Difficulty     Hours
0   ABC         24     1           0.833333
1   CDE         77     1           1.750000
2   SDE         90     3           3.166667
3   QWE         47     1           1.083333
4   ASD         26     3           1.000000

df.shape
(998,4)

X = df.iloc[:,:-1]
y = df.iloc[:,-1].values
from sklearn.compose import ColumnTransformer 
ct = ColumnTransformer([(cat, OneHotEncoder(),[0])], remainder=passthrough)
X = ct.fit_transform(X)  
x = X.toarray()
x = x[:,1:]

x.shape
(998,339)

from sklearn.ensemble import RandomForestRegressor
rf_model = RandomForestRegressor(random_state = 1)
rf_model.fit(x,y)

How can I use this model to predict Hours for user input in this format [[SDE, 90, 3]]

I tried

test_input = [[SDE, 90, 3]]
test_input = ct.fit_transform(test_input)  
test_input = test_input[[:,1:]

test_input[0]
array([24, 1], dtype=object)


predict_hours = rf_model.predict(test_input)
ValueError

Since my dataset has many categorical values its not possible enter the encoded value of SDE as input, I need to convert SDE to onehot encoded format after receiving the input [[SDE, 90, 3]]

I don't know how to do it can anyone help?

Topic one-hot-encoding prediction python

Category Data Science


from sklearn.compose import ColumnTransformer 
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor

df.head()

    Service    Tasks Difficulty     Hours
0   ABC         24     1           0.833333
1   CDE         77     1           1.750000
2   SDE         90     3           3.166667
3   QWE         47     1           1.083333
4   ASD         26     3           1.000000

df.shape
(998,4)

X = df.drop(["Hours"],axis = 1)
y = df.Hours

ct = ColumnTransformer([("cat", OneHotEncoder(handle_unknown = "ignore"),[0])], remainder="passthrough")
    

rf_model = RandomForestRegressor(random_state = 1)

model = Pipeline([("preprocessing",ct),("model",rf_model)]).fit(X,y)

x_test = pd.DataFrame({"Service":"SDE", "Tasks":90, "Difficulty":3}, index = [0])

# Ideally you split your data into train and test,in this case you need to pass x_test that is a pandas dataframe with the values you want to predict
model.predict(x_test)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.