Multiple XGBoost models or just 1 for a cetain type of category?

I am building a model to predict, say house prices. Within my data I have sales and rentals. The Y variable is the price of either the sales or rentals. I also have a number of X variables to predict Y, such as number of bedrooms, bathrooms, meters squared etc.

I believe that the model will firstly make a split on the variable "sales" vs "rentals" as this would reduce the loss function - RMSE - the most.

Do you think it is best to train 2 models one for "sales" and the other for "rentals"? The RMSE for the model is quite high and this is in part due to the incorrect "Sales" predictions.

Topic xgboost predictive-modeling machine-learning

Category Data Science


This is the main advantage of ML: if the variable has any predictive value (that is not included in another variable), it should be used by the model. So, generally speaking it doesn't really make sense to handpick your variables to make different versions of your model (however it make sense to handpick some you want to throw). That would just be equivalent to picking the first binary split in your first tree... you are not achieving much with that.

Edit: ok, it seems that the target is not really well defined as you aggregate things that are monthly payments and things that are the house value. In that case it make sense to have two models. (Honestly it would even make more sense not to aggregate those two distinct dataset in the first place).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.