In CS231n lecture, can't the linear classifier be softmax itself?
I am little bit confused on why the scoring function that is the $f(X,W)$ is chosen to be $W,X$ while they talk about Softmax and SVM loss in this.
Can't they take Softmax classifier or SVM classifier and then explain the losses?
Was there a particular need of taking the above mentioned scoring function?
Topic information-theory deep-learning machine-learning
Category Data Science