In CS231n lecture, can't the linear classifier be softmax itself?

I am little bit confused on why the scoring function that is the $f(X,W)$ is chosen to be $W,X$ while they talk about Softmax and SVM loss in this.

Can't they take Softmax classifier or SVM classifier and then explain the losses?

Was there a particular need of taking the above mentioned scoring function?

Topic information-theory deep-learning machine-learning

Category Data Science


The notation they're using is a bit funny. $f$ is just the dot product between the input and weights, $w^Tx_i$. The loss function is what differentiates those classifiers. Each loss function takes that dot product as an input and uses it in different ways. All of those loss functions compare the output of $f$ in some way with an associated $y$ value: you are computing your loss relative to a target variable to understand how well your model is performing, and then you can compute the gradient of the loss function to improve those predictions. If you ignore the loss function, you're just left with linear regression predictions (and even linear regression needs a loss function, MSE, to fit the weights) which aren't being evaluated relative to anything.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.