Should I create single feature for each specific word which i find in text or one for all them?

Question

Should I create single feature for each specific word which i find in text or one for all them?

Ir8_mind

2022年1月27日 05:27

I am doing feature engineering right now for my classification task. In my dataframe I have a column with text messages. I decided to create a binary feature which depends on whether or not in this text were words call, phone, mobile, @gmail, mail facebook. But now I wonder should I create separate binary features for each word (or group of words) or one for all of them. How to check which solution is better. Is there any metric and how usually people do in practice. Thanks)

Topic dummy-variables feature-engineering nlp machine-learning

Category Data Science

Ashwiniku918 · Accepted Answer · 2022年1月27日 05:27

You should be creating binary features for each group of words. So if you have n groups you should create n-1 features. If you just create one feature it will have 1s for all rows where any word is found and 0 otherwise which will not make sense.

Should I create single feature for each specific word which i find in text or one for all them?

About