In CS231n lecture, can't the linear classifier be softmax itself?

Question

In CS231n lecture, can't the linear classifier be softmax itself?

Ashutosh Mishra

2020年5月8日 15:00

I am little bit confused on why the scoring function that is the $f(X,W)$ is chosen to be $W,X$ while they talk about Softmax and SVM loss in this.

Can't they take Softmax classifier or SVM classifier and then explain the losses?

Was there a particular need of taking the above mentioned scoring function?

Topic information-theory deep-learning machine-learning

Category Data Science

David Marx · Accepted Answer · 2018年1月9日 18:50

The notation they're using is a bit funny. $f$ is just the dot product between the input and weights, $w^Tx_i$. The loss function is what differentiates those classifiers. Each loss function takes that dot product as an input and uses it in different ways. All of those loss functions compare the output of $f$ in some way with an associated $y$ value: you are computing your loss relative to a target variable to understand how well your model is performing, and then you can compute the gradient of the loss function to improve those predictions. If you ignore the loss function, you're just left with linear regression predictions (and even linear regression needs a loss function, MSE, to fit the weights) which aren't being evaluated relative to anything.

In CS231n lecture, can't the linear classifier be softmax itself?

About