Python sklearn PCA transform function output does not match

I am computing PCA on some data using 10 components and using 3 out of 10 as:

transformer = PCA(n_components=10)
trained=transformer.fit(train)
one=numpy.matmul(train,numpy.transpose(trained.components_[:3,:]))

Here trained.components_[:3,:] are:

array([[-1.43311999e-03,  1.65635865e-01,  5.49189565e-01,
         5.26069645e-02,  2.42638594e-01,  1.20957807e-02,
         1.30595572e-01,  1.09279646e-02,  7.21299808e-03,
        -2.79057934e-02, -1.14834589e-02,  5.06289160e-01,
         5.42890317e-01,  8.50422194e-02,  1.80935205e-01,
         2.98473275e-05, -8.04537378e-04],
       [-1.05419313e-02,  3.09442577e-01, -8.15534934e-02,
         4.28621520e-03,  2.93323569e-01,  3.85849115e-02,
        -1.16193185e-01,  4.14964652e-01,  4.16279154e-01,
         2.95264788e-01,  3.28620106e-01, -2.60916490e-01,
        -2.37459426e-02,  1.57567265e-01,  4.02873342e-01,
         5.28389303e-05, -2.07920000e-03],
       [ 8.63072772e-03, -3.26129082e-01,  8.59869400e-02,
         3.04770780e-03, -3.14966419e-01, -2.47151330e-02,
         1.05987767e-01,  3.74235953e-01,  3.75747065e-01,
         2.76035253e-01,  3.18273743e-01,  3.02423861e-01,
         2.76535177e-02, -1.51485057e-01, -4.48558170e-01,
        -8.83328996e-05, -2.25542180e-03]])

and using only 3 components as :

transformer = PCA(n_components=3)
trained=transformer.fit(train)
two=trained.transform(train)

Here the components are:

          array([[-1.43311999e-03,  1.65635865e-01,  5.49189565e-01,
         5.26069645e-02,  2.42638594e-01,  1.20957807e-02,
         1.30595572e-01,  1.09279646e-02,  7.21299808e-03,
        -2.79057934e-02, -1.14834589e-02,  5.06289160e-01,
         5.42890317e-01,  8.50422194e-02,  1.80935205e-01,
         2.98473275e-05, -8.04537377e-04],
       [-1.05419314e-02,  3.09442577e-01, -8.15534934e-02,
         4.28621520e-03,  2.93323569e-01,  3.85849115e-02,
        -1.16193185e-01,  4.14964652e-01,  4.16279154e-01,
         2.95264788e-01,  3.28620106e-01, -2.60916490e-01,
        -2.37459426e-02,  1.57567265e-01,  4.02873342e-01,
         5.28389307e-05, -2.07919994e-03],
       [ 8.63072765e-03, -3.26129082e-01,  8.59869400e-02,
         3.04770780e-03, -3.14966419e-01, -2.47151331e-02,
         1.05987767e-01,  3.74235953e-01,  3.75747065e-01,
         2.76035253e-01,  3.18273743e-01,  3.02423861e-01,
         2.76535177e-02, -1.51485057e-01, -4.48558170e-01,
        -8.83328994e-05, -2.25542175e-03]])

But one comes not equal to two. Components are same in both. They are not same because transform function first subtracts the original data by mean vector and then multiplies with components. But why should the mean be subtracted here. As they are subtracted in the first step to compute PCA for computing basis.

Topic pca scikit-learn python

Category Data Science


If you look at the source code, the PCA is calculated through the SVD. I believe it iterates until "good enough."

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.