For Logistic regression, why is that particular logistic function chosen as opposed to other logistic functions?

The logistic function used in logistic regression is: $\frac{e^{B_{0} + B_{1}x}}{1 + e^{B_{0} + B_{1}x}}$. Why is this particular one used?

Topic logistic-regression

Category Data Science


You can derive the logistic regression model from the assumption of a latent variable from the logistic distribution see https://sciprincess.wordpress.com/2019/03/01/what-is-logistic-in-the-logistic-regression/


We require some link function to map some real-valued output $u \in \mathbb{R}$ to $[0,1]$ so that we may interpret it as probabilities. Obviously there are many such functions, but the standard logistic (sometimes called sigmoid) is simple and convenient since its units of scale is log-odds which is easy to interpret. It is also symmetric.

In Economics, we might view $u$ as representing latent utilities, $$ u = f(x;\beta) + \epsilon $$ where $f(x;\beta)$ is some model of observed covariates (e.g. $f(x) = \beta'x)$.

It is common to assume $\epsilon$ is the standard logistic distribution as it is more "robust" than the normal distribution since it has fatter tails. Then we just end up with the canonical sigmoid link function and logit model. If we had assumed some other distribution for $\epsilon$, for instance normal, then we end up with the normal CDF as the link and probit model. If we think it was asymmetric and assumed $\epsilon$ is Gompertz, then we end up with the Extreme Value model.

Different fields use different types of logistic regression to model the problem. For instance in epidemiology, they might use a Richards growth (generalized logistic function) so that infection rates start off small but get exponentially larger.

In computer science/machine learning, for prediction problems we usually don't have an interpretation for $u$ (e.g. output of a neural network) and so we typically just use the standard logistic activation for convenience.


Generalized linear model operate on the idea that the expected value, conditioned on (or parameterized by) some features, is linearly related to the features once a link function, often called $g$, is applied to the expectation. In math:

$$g(\mathbb E[Y\vert X])=X\beta$$

In logistic regression, we use $g=\log\left(\dfrac{p}{1-p}\right)$. Inverting $g$ to solve for $p$ gives the activation function used in logistic regression.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.