Going from voting per district + census data, to voting per age?
I have some voting or polling data that is listed by voting district. I also have detailed demographics of each voting district. How can I combine this to get an estimation of how the different demographics voted? I want to be able to make a chart of percent yes over age, or income bracket. (In the end, I want to use these relations to try and predict the outcome in a place with different demographics).
One approach I've seen is to treat it as a classification problem, and then assign each district input variables such as % male, % young and so on. Then train a classifier, such as a BDT, and you can use it to predict voting outcomes for other demographics. The problem I see with this is that it treats the whole district as one data point. I can only indirectly get distributions of how the demographics voted. (For example see here: https://towardsdatascience.com/understanding-voting-outcomes-through-data-science-5d257b51ae5c)
I guess another approach would be to randomly generate data points in form of pseudo voters. The benefit would be that I could not only have single distributions (vote vs. age) but also multidimensional distributions (vote vs. age for different ethnicities). But I don't think the source data gives me that much information. And I would not even know how to create the pseudo data. It is probably prohibitively computationally expensive.
This seems like a very standard thing one wants to do, but I can't recall what the go to technique is. (The reverse, combining census data with voting per demographic to predict results in places seems straightforward.) Any suggestions?
Topic inference
Category Data Science