How to combine nlp and numeric data for a linear regression problem
I'm very new to data science (this is my hello world project), and I have a data set made up of a combination of review text and numerical data such as number of tables. There is also a column for reviews which is a float (avg of all user reviews for that restaurant). So a row of data could be like:
{
rating: 3.765,
review: `Food was great, staff was friendly`,
tables: 30,
staff: 15,
parking: 20
...
}
So following tutorials, I have been able to do the following:
- Created a linear regression model to predict rating with the inputs being all the numerical data columns.
- Created a regression model to predict rating based on review text using sklearn.TfidfVectorizer.
But now I'd like to combine models or combine the data from both into one to create a linear regression model. So how can I utilize the vectorized text data in my linear regression model?
Topic tfidf linear-regression scikit-learn nlp
Category Data Science