Why does using a standard scalar on my tf idf matrix make it perform better?
I have a TF-IDF matrix transformed on a list of tweets from a data set I am using. I have a pipeline where I initiate a StandardScalar and then next have my SVM with a linear kernel and auto gamma as the classifier algorithm.
Pretty much as done here in the examples section. With the pipeline, the classifier scores an 87 f1 score. Without the pipe, it scores a dismal 53.
Why is this?
I thought TF-IDF values were already two-fold normalised so shouldn't the standard scalar have no effect as it just performs normalisation again?
Topic tfidf classification svm
Category Data Science