Hard time finding literature on feature clustering using Principal Component Analysis

Question

Hard time finding literature on feature clustering using Principal Component Analysis

aryan

2022年5月11日 12:23

Im new to StackExchange, so i am sorry if this is not the right way to ask a question on StackExhange.

For my thesis I wish to propose a methode for future research on using PCA to cluster features (feature clustering) and then apply per-cluster PCA. I got the idea from this paper: this paper. But I have a hard time finding literature about PCA being used to cluster variables (not reduce variables). I could imagine that it is not ideal to use PCA to cluster variables but i still would like to propose the method. Do any of you know any literature, articles, books etc.

Topic features pca clustering

Category Data Science

Ajay Chawda · Accepted Answer · 2022年5月11日 12:23

Looks like you want go for unsupervised methods for feature selection. You can use PCA but it might not be that effective.

I would suggest to go through these links.

There is a method called Principal Feature Analysis in Link 4. You can have a look at that. If you are using R, there is sparcl package for sparse clustering.

Cryo · Accepted Answer · 2022年5月11日 04:34

I am not sure PCA is quite what you are after. I think it may help to visualize what you are after. I think the image is as follows for 5 features with 2 records (i.e. 2 rows 5 columns):

Here, because there are only two records, features are 2D vectors. Would you be after capturing A, B, C in one cluster and D, E in another?

If so, I think you should simply do eigenvalue decomposition of the covariance matrix. That will give you the eigenvectors along which you have greatest variance. For the example above, I would have 5x5 covariance matrix, with two eigenvectors that have large eigenvalues, and 3 more that have very small eigenvalues.

The eigenvectors with large eigenvalues are then your clustering target, project your feature vectors along these eigenvectors. e.g. if $V_1$ and $V_2$ are your two eigenvectors, then compute magnitudes of dot-products $\left|A.V_1\right|$ and $\left|A.V_2\right|$. Then assign feature $A$ to cluster 1 if $\left|A.V_1\right|>\left|A.V_2\right|$ and vice versa. Depending on your data it may be a good idea to group features that are aligned with eigenvectors that have very similar eigenvalues (as a form of regularization).

PS: PCA does similar things but is geared toward a different purpose (i.e. some of the information provided by SVD of a design matrix, is similar to what you get from diagonalization of the covariance matrix)

Hard time finding literature on feature clustering using Principal Component Analysis

About