How can ANN handle varied sized inputs?

I have a dataset with a message (string) and an associated mood. I am trying to use an ANN to predict one of the 6 moods using the encoded inputs.

This is how my X_train looks like:

array([list([1, 60, 2]),
   list([1, 6278, 14, 9137, 334, 9137, 8549, 1380, 7]),
   list([5, 107, 1, 2, 156]), ..., list([1, 2, 220, 41]),
   list([1, 2, 79, 137, 422, 877, 5, 230, 621, 18]),
   list([1, 11, 66, 1, 2, 9137, 175, 1, 6278, 5624, 1520])],
  dtype=object)

Since every array has a different length, it's not being accepted. What can I do about it?

PS: The encoded values were generated using keras.preprocessing.Tokenizer()

Topic ann numpy deep-learning

Category Data Science


I am not sure how well it will perform, but what about padding the shorter messages with special characters (let's say zeros) so that they are as long as the longest message?

However, some kind of embedding would definitely be better (also for the sake of the target - predicting the sentiment) if you have enough data.


One way is to encode your input in a fixed size dimension. That is, you can use an RNN, like an LSTM, with padding and its output should be the input of an ANN.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.