How to work with input which is a combination of metadata+ vectorized text data + image pixel data to build a Regression Model (predict views)?
There are 4 datasets (all in csv format), each has a uniqueID column by which each record can be identified. Image and text datasets are dense datasets.(need to be converted to ndarray).
Can someone suggest how to use all these 4 datasets for building a regression model?
This is how the datasets look,
Metadata having some input features and target variable(views)
uniqueID ad_blocked embed duration language hour views
1 True True 68 3 10 244
2 False True 90 1 15 63
3 True False 195 3 7 350
Vectorized title data - one entire row represents a title
uniqueID title_1 title_2 title_3
1 -0.977637 -0.543310 0.079403
2 0.041873 0.644655 -0.406487
3 0.503560 -0.085412 0.841144
Vectorized descriptions data - one entire row represents a description
uniqueID title_1 title_2 title_3
1 -0.052256 -0.016036 0.079403
2 0.000106 0.356706 -0.025788
3 0.015774 -0.085412 0.712229
Thumbnail pixel data - one entire row represents an image
uniqueID image_1 image_2 image_3
1 -0.484456 -0.543310 0.032915
2 0.666147 0.644655 -0.005733
3 0.035018 -0.011111 0.841144
Topic sparse regression metadata nlp python
Category Data Science