Binomial family in logistic regression

I was asked in an interview why do we use the binomial distribution in logistic regression and how is it related to the class that we are predicting?

Could anyone explain, without any mathematical equations, why do we use binomial instead on any other distribution?

Topic distribution logistic-regression classification

Category Data Science


Assume that you have a time variable and you observe at each time and at a certain bus stop if there is a bus arriving or not. Let the probability that a bus arrives at a bus stop at time $t$ be denoted as $p(t)$. This essence of success/failure is a binomial distribution and Logistic regression computes/predicts $p(t)$ by shifting and stretching the logistic curve.


From wikipedia:

..., the binomial distribution with parameters n and $\rho$ is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own boolean-valued outcome: a random variable containing a single bit of information: success/yes/true/one (with probability $\rho$) or failure/no/false/zero (with probability $\rho = 1 − \rho$).

So if you know that logistic regression is performed in order to model a binary output variable to some modelling question (i.e. to give 0 or 1, yes or no, etc.), it would make sense to base any probabilistic assumptions on a distribution, which shares this feature. Therefore, a binomial distribution may make sense compared to a continuous distribution, such as a Gaussian or Cauchy.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.