Multi Linear Regression on String Values
I'm using datasets which involves mostly of string values. The main outcome of the project is that it should predict success. Now I can use OneHotEncoding to convert string values in numerical format but the values are a lot. I'm using Multi Linear Regression and the only numerical value is of the output which is supposed to be predicted by my model.
Query 1: By using sklearn, when encoding the string values, should it not take the whole resources as there are a lot of values ?
Query 2 : Will my model work if the independant values are in string format and the dependant value is in numerical format ? Like should it need some independant values in numerical format or string format is just fine ?
Query 3 : Is there any other and better way to use instead of OneHotEncoding ?
Explanation: I'm trying to use IMDb datasets to predict the success of movie using movie's cast, producers, genres and some other variables. There are almost 5-6 independant variables used. and the dependant variable used are the ratings of the movies
Topic one-hot-encoding linear-regression scikit-learn python machine-learning
Category Data Science