Training a model where each response in the observation data has a different known varience
I have a dataset where each response variable is the number of successes of N Bernoulli trials with N and p (the probability of success) being different for each observation. The goal is to train a model to predict p given the predictors. However observations with a small N will have a higher variance and higher N.
Consider the following scenario to illustrate better: Assume coins with different pictures on them have a different bias and that the bias is dependent on the picture on the coin. I have a large number of coins each with a different picture on them and each with a different bias p. I want to create a model that can predict the bias of a coin given only the picture on the coin. I flip each coin a different number of times and record the number of successes and total number of flips. So my data set consists of each picture and its estimate p=successes/flips.
So my question is when training my model how should I handle this. It seems more weight should be given to observations with a higher sample size(number of flips). I don't think it makes sense to include number flips as a predictor variable because the point is to build a model which predicts p using only the picture on the coin so this difference in variance for the response for each observation should be taken into account when training the model.
I am using several types of model but mainly working with keras and xgboost
Topic training keras weighted-data xgboost
Category Data Science