How to create group IDs for people in longitudinal data
I have a large data set which contains individuals and the address where they live. I want to create a group ID based off shared addresses (the working idea: people who share the same address can be considered as part of the same family/household). And from that household ID, my PI wants to investigate households/families migration overtime due to cost of living increases/decreases.
However, the difficulty is the dataset/analysis is longitudinal. So we have this data set spanning multiple consecutive time periods. We want to attach a household ID to each person, which they can be associated with at any point in the data. This has a couple issues.
- People move in/out of households.
- People start their own households with other people
- The dataset doesn't keep track of people under 18 so when they come of age they pop up in the period of data where they turn 18
- etc
The PI is flexible on their definition of households and we have so far come up with a couple of ideas.
Anchor households: create household IDs with the linked addresses at the beginning of the study, and having those individuals associated with this starting ID. Issue: individuals breaking/split off from their households resulting in
Captain/HeadofHouse: following one individual in the household at the start of the data, and grouping people who come into their household based on their assigned Captain ID. Issue: hard to make a distinction who gets assigned captain.
Multiple IDs: Assigning IDs at each period of data and then creating a graph for association. Best idea so far, but might make analysis a little more difficult.
Webbing: using component connection to attach each individual through the time periods. Weak connections eliminated (1-2 associations or less). E.g. I would be associated with each roommate my roommate has had. Issue: Super Messy (although might be fun to try and implement)
So I am looking for resources or suggestions on how to deal with a longitudinal grouping problem. So far I have looked into connected components, associative groups,and graph theory. Please, if you have any suggestions I would be very grateful. I am using Python so any library suggestions would be appreciated as well.
Please let me know if I need to explain anything further, or if there is some other information which would be helpful.
Topic association-rules python
Category Data Science