What Non-linearities are best in Denoising RNN Autoencoders and where should the go?

I’m employing a denoising RNN autoencoder for a project relating to motion capture data. This is my first time using auto encoder architectures and I was just wondering what non-linearities should be placed in these models and where they should go. This is my model as it stands:

class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers):
        super(EncoderRNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers

        self.rnn_enc = nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True)
        self.relu_enc = nn.ReLU()

    def forward(self, x):
        pred, hidden = self.rnn_enc(x, None)
        pred = self.relu_enc(pred)
        return pred


class DecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, num_layers):
        super(DecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers

        self.rnn_dec = nn.RNN(input_size=hidden_size, hidden_size=output_size, num_layers=num_layers, batch_first=True)
        self.relu_dec = nn.ReLU()

    def forward(self, x):
        pred, hidden = self.rnn_dec(x, None)
        pred = self.relu_dec(pred)
        return pred


class RNNAE(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers):
        super(RNNAE, self).__init__()
        self.encoder = EncoderRNN(input_size, hidden_size, num_layers)
        self.decoder = DecoderRNN(hidden_size, input_size, num_layers)

    def forward(self, x):
        encoded_input = self.encoder(x)
        decoded_output = self.decoder(encoded_input)
        return decoded_output

As you can see I have a ReLU non-linearity in each of the encoder and decoder networks but am not sure whether this is the correct implementation for these architectures. The model learns on the data OK but the MSE loss doesn’t really improve after the first few epochs and I have a suspicion it’s because of these ReLU functions.

Any advice on how to improve this/ whether this is basically correct would be very helpful.

Cheers!

Topic learning autoencoder rnn

Category Data Science


@Leevo. Many thanks for this. I've sorted out the issue: I can see that it is resolved by LSTMs which have inbuilt ReLu non linearities in each cell. However, my question was relating to vanilla RNN architectures. What I've found is that it's a well known issue with RNN autoencoders and that a general/partial solution is to use eith ELU or sigmoid functions. After playing around with these I've found that LSTMs generally do a better job with less tinkering so I've sided with that approach for the time being. Thanks again.


I have some familiarity with Recurrent encoder-decoder archtectures, especially with seq2seq models. This is a standard implementation:

from tensorflow.keras import Sequential

from tensorflow.keras.layers import Bidirectional, LSTM
from tensorflow.keras.layers import RepeatVector, TimeDistributed
from tensorflow.keras.layers import Dense

from tensorflow.keras.activations import elu, relu


seq2seq = Sequential([

    Bidirectional(LSTM(len_input), input_shape = (len_input, no_variables)),

    RepeatVector(len_input),

    Bidirectional(LSTM(len_input, return_sequences = True)),

    TimeDistributed(Dense(100, activation = elu)),

    TimeDistributed(Dense(1, activation = relu))

])

where len_input is the length of the input trends, and np_vars the number of variables that compose each observations (if it's a univariate series, set it to 1). The choice of Bidirectional() wrappers is optional.

The model works like this: The input LSTM layer encodes the sequential information to a vector containing a latent representation. This vector is replicated by the RepeatVector() layer, its signal is then calculated using Dense() layers, and distributed for each timestep prediction sequentially (by the TimeDistributed() wrapper). You don't worry about non-linearities at this point, since this is learned by internal LSTM states and by Dense() vectors. Fancier implementation would require stacked Recurrent layers, or deeper feed-forward blocks.

If your goal is to build a Denoising Autoencoder you should pair each input series to its "clean" counterpart, so that the Network learns how to reduce noise.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.