Predict apartment prices with two sources of prices

Question

Predict apartment prices with two sources of prices

glycine-addict

2022年3月17日 08:04

I am asking for help with the following problem.

There are two subsamples in the dataset - one where the target is real(valid), and the other where it is approximate (I do not know how it differs yet, on one sample the real price of an apartment, and on the other the price from ads, you need to predict the real one, of course). Any ideas about what to do about this? I have two ideas - to normalize the target from ads (to bring the expectation and variance to a real target), and also to modify the loss so that it punishes more for an error on a real target. There are no more ideas. Therefore, I ask for help.

Update: Sorry for being stingy with details. The problem is to predict apartment price, which is made by professional realtors. The dataset has plenty of features (like the number of shops in some radius, distance to the closest school, etc.), and we have two subsets in this dataset: the first is a dataset with prices developed by realtors and the second is a subset with prices from advertisements. The goal is to predict the price the way realtors would, but of course, realtor predictions are expensive, so we do not have enough data, and we use data from advertisements as well. So I am asking what is the best way to treat the subset with target values from advertisements.

Topic target-encoding regression

Category Data Science

Brian Spiering · Accepted Answer · 2021年10月7日 23:09

This is commonly called weak supervision, noisy, limited, or imprecise target values.

One option is to train a surrogate model. Use the realtor prices as ground truth, then train a model that "translates" advertisement prices to mimic realtor prices.

Predict apartment prices with two sources of prices

About