How to deal with a potencially multiple categorical variable
I'm build a model that has, as inputs, some categorical variables. I had already dealt with this sort of data before, and applied different techniques as creation of dummy variables and factor scoring. However, I have now a different type of problem which I can not see the obvious best answer to.
For each individual we can have multiple instances of this categorical variable $X$. When such cases happen on numerical variables I usually take the max/mean/min depending on context. I of course, one can use said context to build something similar here. However I'm curious about a general approach.
Assuming that for each object (row in our input matrix) we can have multiple entries of an categorical variable. Furthermore, assume that said variable can have many different values, and that for the context it can be relevant the combinations per row.
What would be a general approach to this variable?
Topic dummy-variables feature-engineering aggregation categorical-data
Category Data Science