Binomial family in logistic regression

Question

Binomial family in logistic regression

Tejas Bawaskar

2022年4月17日 16:04

I was asked in an interview why do we use the binomial distribution in logistic regression and how is it related to the class that we are predicting?

Could anyone explain, without any mathematical equations, why do we use binomial instead on any other distribution?

Topic distribution logistic-regression classification

Category Data Science

Ahmad Bazzi · Accepted Answer · 2018年9月16日 02:06

Assume that you have a time variable and you observe at each time and at a certain bus stop if there is a bus arriving or not. Let the probability that a bus arrives at a bus stop at time $t$ be denoted as $p(t)$. This essence of success/failure is a binomial distribution and Logistic regression computes/predicts $p(t)$ by shifting and stretching the logistic curve.

n1k31t4 · Accepted Answer · 2018年7月17日 19:21

From wikipedia:

..., the binomial distribution with parameters n and $\rho$ is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own boolean-valued outcome: a random variable containing a single bit of information: success/yes/true/one (with probability $\rho$) or failure/no/false/zero (with probability $\rho = 1 − \rho$).

So if you know that logistic regression is performed in order to model a binary output variable to some modelling question (i.e. to give 0 or 1, yes or no, etc.), it would make sense to base any probabilistic assumptions on a distribution, which shares this feature. Therefore, a binomial distribution may make sense compared to a continuous distribution, such as a Gaussian or Cauchy.

Binomial family in logistic regression

About