How to cluster/group these data points (using K-Mean or Hirarachal clustering)
I have genes from different species
Gene A , Gene B, Gene C, ... Gene Z
Some Genes are similar to each other
A G are 96% similar
C H are 92% similar
G B are 89% similar
G T are 85% similar
.
.
.
K F are 52% similar
I want to classify these genes into groups of species
Species A, B, T, G are the same species Species C, H, N, R, L, P are the same species . . . K does not seem to be similar to any species (it is unknown or a species by itself) F does not seem to be similar to any species (it is unknown or a species by itself)
I know that I can use K-Mean to cluster these genes.
but not sure how to build the feature set to be used in K-Mean
all the examples online are for 2-dimensional datasets
something like this
So can someone help me with how to build this dataset features to be used with K-Mean
Topic hierarchical-data-format feature-extraction k-means
Category Data Science