How to compute constant c for PCA features before SGDClassifier as advised in Scikit documentation?
In the documentation for SGDClassifier here, it is stated;
If you apply SGD to features extracted using PCA we found that it is often wise to scale the feature values by some constant c such that the average L2 norm of the training data equals one.
- Given, I have a dummy training dataset as
import numpy as np
data = np.random.rand(3,3)
How can I compute c
and scale the feature values?
- I am using
IncrementalPCA
beforeSGDClassifier (loss=log)
. Should I computec
after every batch'spartial_fit
andtransform
and scale the transformed data byc
before feeding into theSGDClassifier
?
On a side note, there is a similar question in this forum at here, however, there is no answer to that question. I have also asked this question in Scikit Github's discussion at here but there is no answer.
Thank you for your kind help.
Topic sgd pca scikit-learn
Category Data Science