Learning the Average of a 0/1 Dependent Variable
uppose I have a matrix and a dependent vector whose entries are each
in {0,1}
dependent on the corresponding row of
Given this dataset, I'd like to learn a model, so that given some other dataset ′, I could predict average(′) of the dependent-variable vector ′. Note that I'm only interested in the response on the aggregate level of the entire dataset.
One way of doing so would be to train a calibrated binary classifier →, apply it to ′, and average the predictions. However, the first step addresses a more difficult problem - predicting each of the dependent variables - which I'm basically just reducing to their average. I'm wondering if there is an alternative way that is somehow better (e.g., less prone to overfitting), or a theoretical explanation why there is no such alternative.
Topic theory aggregation classification
Category Data Science