How to compute constant c for PCA features before SGDClassifier as advised in Scikit documentation?

Question

How to compute constant c for PCA features before SGDClassifier as advised in Scikit documentation?

Jack

2022年2月7日 11:30

In the documentation for SGDClassifier here, it is stated;

If you apply SGD to features extracted using PCA we found that it is often wise to scale the feature values by some constant c such that the average L2 norm of the training data equals one.

Given, I have a dummy training dataset as

import numpy as np
data = np.random.rand(3,3)

How can I compute c and scale the feature values?

I am using IncrementalPCA before SGDClassifier (loss=log). Should I compute c after every batch's partial_fit and transform and scale the transformed data by c before feeding into the SGDClassifier?

On a side note, there is a similar question in this forum at here, however, there is no answer to that question. I have also asked this question in Scikit Github's discussion at here but there is no answer.

Thank you for your kind help.

Topic sgd pca scikit-learn

Category Data Science

How to compute constant c for PCA features before SGDClassifier as advised in Scikit documentation?

About