Assessing Group Similarities and Dissimilarities Post PCA
The goal is to assess similarity and dissimilarity between 6 known groups.
The original data began with the 6 known groups and 2,700+ variables all on a scale of 0 to 100.
I have performed PCA to reduce the 2700+ variables into 5 principal components using the dudi.pca function from the ade4 package in R. Here are the Eigenvalues for the components:
eigenvalue variance.percent cumulative.variance.percent
Dim.1 998.3274 36.635867 36.63587
Dim.2 670.1278 24.591848 61.22771
Dim.3 482.2372 17.696776 78.92449
Dim.4 352.2806 12.927728 91.85222
Dim.5 222.0270 8.147781 100.00000
I would now like to assess the distances between the 6 known groups. Is this done as simply as generating a distance matrix using each group's coordinates for each of the principal components? If so, I am leaning towards using Manhattan distance to get the absolute distance.
Here are the coordinates of each group:
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
Group 1 69.019038 7.940190 0.4985599 - 6.847178 0.3964117
Group 2 -16.302322 -25.965373 -29.3084201 -23.013430 9.9183010
Group 3 -26.313850 50.159662 6.9486408 -10.713924 5.2883152
Group 4 -12.800767 -26.211432 39.5067264 - 8.775551 - 8.8840592
Group 5 - 9.228404 2.648632 -20.4297314 16.685426 -26.8559444
Group 6 - 4.373694 -8.571679 2.7842244 32.664657 20.1369757
If not, what would be the appropriate way to assess individual similarity/ dissimilarity post PCA?
Topic pca distance similarity r
Category Data Science