How to process word similarity and categorize a group of words to a single word
Am new in this area and have been searching for some time only to find multiple different possible approaches but nothing concrete.
If I have a wordlist of say; email_addr, email, email_address, address or more dissimilarly first, first_name, firstName, christianName, christian_name, name. What would be the most suitable approach to classify each of those lists to a single word, like email or givenName respectively?
I've seen some articles proposing; Levenstein, fuzzy matching, difference algorithm, support vector machines of which I don't think any quite satisfy the requirement, unless I am missing something.
Would appreciate any links or direction to research.
Essentially, the objective is to categorize all column names in a data set so I can map them to a method for each type of column to generate mock data.
Topic fuzzy-classification classification machine-learning
Category Data Science