Python sklearn PCA transform function output does not match

Question

Python sklearn PCA transform function output does not match

shaifali Gupta

2022年5月22日 01:00

I am computing PCA on some data using 10 components and using 3 out of 10 as:

transformer = PCA(n_components=10)
trained=transformer.fit(train)
one=numpy.matmul(train,numpy.transpose(trained.components_[:3,:]))

Here trained.components_[:3,:] are:

array([[-1.43311999e-03,  1.65635865e-01,  5.49189565e-01,
         5.26069645e-02,  2.42638594e-01,  1.20957807e-02,
         1.30595572e-01,  1.09279646e-02,  7.21299808e-03,
        -2.79057934e-02, -1.14834589e-02,  5.06289160e-01,
         5.42890317e-01,  8.50422194e-02,  1.80935205e-01,
         2.98473275e-05, -8.04537378e-04],
       [-1.05419313e-02,  3.09442577e-01, -8.15534934e-02,
         4.28621520e-03,  2.93323569e-01,  3.85849115e-02,
        -1.16193185e-01,  4.14964652e-01,  4.16279154e-01,
         2.95264788e-01,  3.28620106e-01, -2.60916490e-01,
        -2.37459426e-02,  1.57567265e-01,  4.02873342e-01,
         5.28389303e-05, -2.07920000e-03],
       [ 8.63072772e-03, -3.26129082e-01,  8.59869400e-02,
         3.04770780e-03, -3.14966419e-01, -2.47151330e-02,
         1.05987767e-01,  3.74235953e-01,  3.75747065e-01,
         2.76035253e-01,  3.18273743e-01,  3.02423861e-01,
         2.76535177e-02, -1.51485057e-01, -4.48558170e-01,
        -8.83328996e-05, -2.25542180e-03]])

and using only 3 components as :

transformer = PCA(n_components=3)
trained=transformer.fit(train)
two=trained.transform(train)

Here the components are:

          array([[-1.43311999e-03,  1.65635865e-01,  5.49189565e-01,
         5.26069645e-02,  2.42638594e-01,  1.20957807e-02,
         1.30595572e-01,  1.09279646e-02,  7.21299808e-03,
        -2.79057934e-02, -1.14834589e-02,  5.06289160e-01,
         5.42890317e-01,  8.50422194e-02,  1.80935205e-01,
         2.98473275e-05, -8.04537377e-04],
       [-1.05419314e-02,  3.09442577e-01, -8.15534934e-02,
         4.28621520e-03,  2.93323569e-01,  3.85849115e-02,
        -1.16193185e-01,  4.14964652e-01,  4.16279154e-01,
         2.95264788e-01,  3.28620106e-01, -2.60916490e-01,
        -2.37459426e-02,  1.57567265e-01,  4.02873342e-01,
         5.28389307e-05, -2.07919994e-03],
       [ 8.63072765e-03, -3.26129082e-01,  8.59869400e-02,
         3.04770780e-03, -3.14966419e-01, -2.47151331e-02,
         1.05987767e-01,  3.74235953e-01,  3.75747065e-01,
         2.76035253e-01,  3.18273743e-01,  3.02423861e-01,
         2.76535177e-02, -1.51485057e-01, -4.48558170e-01,
        -8.83328994e-05, -2.25542175e-03]])

But one comes not equal to two. Components are same in both. They are not same because transform function first subtracts the original data by mean vector and then multiplies with components. But why should the mean be subtracted here. As they are subtracted in the first step to compute PCA for computing basis.

Topic pca scikit-learn python

Category Data Science

Carl Rynegardh · Accepted Answer · 2020年12月3日 00:39

1

Carl Rynegardh answered at 2020年12月3日 00:39

If you look at the source code, the PCA is calculated through the SVD. I believe it iterates until "good enough."

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py

Python sklearn PCA transform function output does not match

About