Is it good practice to remove the numeric values from the text data during preprocessing?
Im doing preprocessing on a text dataset. I have certain numerics in it like:
- date(1st July)
- year(2019)
- tentative values (3-5 years/ 10+ advantages).
- unique values (room no 31/ user rank 45)
- percentage(100%)
Is it recommended to discard this numerics before creating a vectorizer(bow/tf-idf) for any model(classification/regression) development?
Any quick help on this is much appreciated. Thank you
Topic bag-of-words hashingvectorizer tokenization tfidf nlp
Category Data Science