Advantages of CNN vs. LSTM for sequence data like text or log-files

When do you tend to use CNN rather than LSTM (or the other way round) in classification or generation tasks of sequential data like text or log-data? What are the reasons for the decision and what does it depend on? Are there any papers or statistics that confirm this?

I'm thinking of data like Linux log entries or short sentence of length of less than 20 words/tokens.

Personally i would almost always use LSTM but I'm curious if CNN wouldn't be better in some cases, if its possible to implement them in a meaningful way. On short sentence there isn't much buffer to use CNN if i'm not mistaken.

Topic cnn lstm text sequence deep-learning

Category Data Science


The WaveNet paper is a good place to start for a discussion of Causal CNNs vs LSTMs for synthesis and classification. In that paper they actually train the network to do both at the same time, for example.

For short tokenized sentences such as you describe, Transformers are probably the state-of-the-art though. In fact, they have been used recently in place of CNNs even for image classification tasks. So, you don't necessarily have to have an encoder portion at all. And, I believe the general thought is that the Transformer is encapsulating the function of both RNNs and CNNs and is outperforming them both in benchmarked tasks.

So, any of these networks could probably do the job you describe, but I would tend to using a Transformer as my first go. It is built for this kind of task and is quicker and easier to train than a RNN. I do suspect though that you would get good results with a LSTM on such short sequences. They can be difficult to train properly on very long data.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.