How to process the hyphenated english words for any nlp problem?
Im doing preprocessing on english text dataset. I encounter hyphenated words like 'well-known'. Will it be useful
- if I remove the hyphen as special character and treat it as a single word 'wellknown' or
- separate the word into 2 'well' and 'known' or
- use all 3 words 'well' , 'known', 'wellknown' in vector creation(BOW/TF-IDF) process for model input.
Any quick help on this would be more appreciated. Thank you.
Topic bag-of-words tokenization tfidf preprocessing nlp
Category Data Science