How to process word similarity and categorize a group of words to a single word
Am new in this area and have been searching for some time only to find multiple different possible approaches but nothing concrete.
If I have a wordlist of say; email_addr
, email
, email_address
, address
or more dissimilarly first
, first_name
, firstName
, christianName
, christian_name
, name
. What would be the most suitable approach to classify each of those lists to a single word, like email
or givenName
respectively?
I've seen some articles proposing; Levenstein, fuzzy matching, difference algorithm, support vector machines of which I don't think any quite satisfy the requirement, unless I am missing something.
Would appreciate any links or direction to research.
Essentially, the objective is to categorize all column names in a data set so I can map them to a method for each type of column to generate mock data.
Topic fuzzy-classification classification machine-learning
Category Data Science