Agglomerative Clustering (average linkage) and Pearson Correlation
Does having a positive or negative correlation between features being clustered affect the agglomerative clustering result?
I have three columns in my dataset, and I'm trying to figure out if I should cluster on all three features or use only a subset.
The Pearson correlation coefficients are:
X Z -- -0.07, p=0.14
X Y -- -0.08, p=0.08
Z Y -- 0.68, p0.001
The Variance Inflation Factor is:
variables VIF
Y 2.816716
X 3.552227
Z 6.232414
Should I choose X and Y because p-value 0.05: The correlation is not statistically significant? Just looking at the Variance Inflation Factor and Pearson Correlation analysis enough to determine which features should be chosen for clustering?