Adding $1$ feature to data point to avoid bias
Given data point $x\in X,\ x\in \mathbb{R}^p$, once we resolve the parameters of the linear discriminant model, we will have $\hat{B} = (X^TX)^{-1}X^TY$, where $Y \in \mathbb{R}^{N\times K}$ is the response matrix. We can then produce a vector for each data point $\hat{f}\left( x \right) =\left[ \left( 1\ x \right) \hat{B} \right] ^T$To know which class the data point belongs to, we use $G\left( x \right) ={arg\max}\hat{f}_k\left( x \right)$. I noticed that $(p+1)$ has 1 added feature to all inputs and when I asked about it I got that it's to avoid bias and same for $(1~~x)$ in $\hat{f}\left( x \right) =\left[ \left( 1\ x \right) \hat{B} \right] ^T$.
Question: Can you please explain how adding $1$ above avoids bias please?
Topic bias lda-classifier classification
Category Data Science