How to deal with one output for multiple inputs?

Question

How to deal with one output for multiple inputs?

nilosch

2022年5月7日 16:17

Hei!

I want to train a model, that predicts the sentiment of news headlines. I've got multiple unordered news headlines per day, but one sentiment score.

What is a convenient solution to overcome the not 1:1 issue?

I could:

Concatenate all headlines to one string, but that feels a bit wrong, as an LSTM or CNN will use cross-sentence word relations, that don't exist.
Predict one score per headline (1:1), and take the average in the application. But that might miss some cross-news dependencies.

I want that

only one value/category is predicted for multiple headline
the order of the news doesn't matter (ideally without shuffling)
the number of headlines per day is variable (would also be open to just pick random 10 headlines)

What's the usual handling for this?

Topic deep-learning sentiment-analysis text-mining neural-network

Category Data Science

Gius · Accepted Answer · 2022年5月7日 16:17

If you don't want any relationship between words of different sentences during encoding, you can encode your sentences separately (in this way you don't have that relationship, because each sentence is tokenized alone), and then you can concatenate your word embeddings so you have your final word embedding which contains all headlines you need for the single prediction, encoded without any dependency from each other.

If you use bert, so you need ids and attention mask, just proceed in the same way (tokenizing sententences separately), with padding=False, and then concatenate ids vectors (or tensors if you set return_tensors parameter). Now you can create your own attention mask by simply creating a tensor with a number of straight '1' equal to the length of the ids vector, padded with '0' since you reach the lenght of 512 for that tensor (or better, stop at the max length of your ids vectors).

The last step is to pad your ids vectors with '0' since you reach the lenght of 512 for that tensor or better, stop at the max length of your ids vectors (it depends on what you chosen for the attention mask, since they need to have the same length).

How to deal with one output for multiple inputs?

About