Multi Linear Regression on String Values

I'm using datasets which involves mostly of string values. The main outcome of the project is that it should predict success. Now I can use OneHotEncoding to convert string values in numerical format but the values are a lot. I'm using Multi Linear Regression and the only numerical value is of the output which is supposed to be predicted by my model.

  1. Query 1: By using sklearn, when encoding the string values, should it not take the whole resources as there are a lot of values ?

  2. Query 2 : Will my model work if the independant values are in string format and the dependant value is in numerical format ? Like should it need some independant values in numerical format or string format is just fine ?

  3. Query 3 : Is there any other and better way to use instead of OneHotEncoding ?

Explanation: I'm trying to use IMDb datasets to predict the success of movie using movie's cast, producers, genres and some other variables. There are almost 5-6 independant variables used. and the dependant variable used are the ratings of the movies

Topic one-hot-encoding linear-regression scikit-learn python machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.