What is the best way to feature engineer features which have more than one repeated values?
What is the best way to feature engineer features which have more than one repeated values ? I want to parse this data and finally keep in a pandas df for further analysis. Example, I have data of people's profile which consists of
Name, Age, Gender, Company, Degree
Now it is easy to keep Name , age and gender which has specific single value, but company can have more than one value or multiple value like someone worked with Google or Microsoft or both Google, Microsoft.
Same case with Degree , people can have single as well as multiple values together.
Right now I have kept them as comma separated values like if someone has more than one company then value is Google, Microsoft. While I encode them using say sklearn Label Encoder I get different codes like Google = 1 Microsoft = 2 Google, Microsoft = 3
Which I guess is not very accurate, as when the data increases it will explode with number of combinations also, if I have to find similar features of those who worked at Google I might not get the correct answer as code 2 and code 3 will never match.
Is there a better way to handle such data ?
Topic feature-engineering machine-learning
Category Data Science