Clustering with hierarchical data dependencies
I am currently looking into how to cluster data with hierarchical dependencies. An example of a problem that I want to cluster: we would like to cluster cities to identify similar characteristics with respect to inhabitants. As input data, I have some characteristics such as the age, weight, height and sex of the inhabitants. Each city will therefore be modeled by a vector :
______________ _ _
number of people aged 20 years old | x_1 |
number of people aged 21 years old | x_2 |
age | |
| |
| |
______________ number of people aged 79 years old | x_k |
number of people of weight of 55kg | |
number of people of weight of 56kg | |
| |
weight | |
number of people of weight of 100kg | |
______________ number of people of weight of 111kg | |
number of people of height of 1.55m | |
number of people of height of 1.56m | |
height | |
| |
number of people of height of 2.02m | |
______________ number of people of height of 2.03m | |
sexe number of male inhabitant | |
______________ number of female inhabitant |_ x_n _|
If I want to use k-means the input data are not independent, there is a strong correlation between different ages, different heights, etc ... Moreover, it seems illogical to me to have different dimensions for variables representing the same thing.
I'm not sure if there are any methods to deal with this kind of problem or if it's just a way to write it differently.
Topic unsupervised-learning clustering machine-learning
Category Data Science