How to deal with name strings in large data sets for ML?
My data set contains multiple columns with first name, last name, etc. I want to use a classifier model such as Isolation Forest later.
Some word embedding techniques were used for longer text sequences preferably, not for single-word strings as in this case. So I think these techniques wouldn't be the way that will work correctly. Additionally Label encoding or Label binarization may not be suitable ways to work with names, beacause of many different values on the on side (Label binarization) and no direct comparison between names on the other side (Label encoding).
Are there other approaches to use or transform especially name information in order to work with ML algorithms?
Topic preprocessing classifier encoding nlp python
Category Data Science