Understand how to simulate a statistics
This solution describes how to simulate statistics to find a confidence interval. A journalist called 1000 people in town to ask who will they be voting for out of candidates A and B. The observed value came out to be 511 votes for A and 489 votes for B. this makes us think that candidate A will win. But we need to know if this sample is truly representative of the underlying population distribution. To find this, we simulate this poll 1000 times through below python function.
def sample(A,n=1000):
return pd.DataFrame({'vote': np.where(np.random.rand(n) A,'A','B')})
s = sample(0.51,n=1000)
dist = pd.DataFrame([sample(0.51).vote.value_counts(normalize=True) for i in range(1000)])
what I cannot understand is, what is the significance of parameter A in the function definition.
Is it trying to simulate a sample where A occurs 51% times? If I am just trying to random samples from a population, why am I relying on 0.51 to do so?
Topic confidence distribution simulation python
Category Data Science