NLP Basic input doubt

Question

NLP Basic input doubt

mewbie

2022年4月11日 12:11

I actually have a basic doubt in NLP,

When we consider traditional models like Decision trees, The feature column order is important, Like first column is fixed with some particular attribute. So If, I have Tf-Idf Each word will have some fixed index and the model can learn.

But in the case of LSTM, Sentences can be jumbled. For eg: There is heavy rain, Heavy rain is there

In the above 2 sentences, The word heavy occurs in different places. So in order for the model to understand that we have passed the word There, We would require some unique representations for the word there. Either a One-Hot or Word2vec. Is my understanding so far right?

My final doubt is, If I use tfidf for the above, How will it work? How will the model understand that heavy word is passed? This doubt has been bugging me for long. Kindly clarify this! Thanks a ton!

Topic lstm tfidf deep-learning nlp machine-learning

Category Data Science

Kasra Manshaei · Accepted Answer · 2022年4月11日 12:11

First of all in BOW model, the order is not represented. Decision Tree does not care if "heavy" is the first feature or last feature and it works the same for both as BOW just models the "existence" of words in documents. So your final doubt is actually nothing to worry about. In both sentences you have the word "heavy" and in that column you get a $1$ for both sentences (or TF, or TF-IDF or any other count you use).

LSTM sees the order as it has a memorising behaviour. That means word "heavy" has an index and during long training, your model learns the probability of usage of word heavy. That means it models your text. so your understanding is right, there is a representation of words for last too.

NLP Basic input doubt

About