Autoencoder train and test accuracy shooting to 99% on few epochs

I am trying to train an autoencoder for dimensionality reduction and hopefully for anomaly detection. My data specifications are as follows.

  • Unlabeled
  • 1 million data points
  • 9 features

I am trying to reduce it to 2 compressed features so I can have better visualization for clustering.

My autoencoder is as follows where latent_dim = 2 and input_dim = 9

class Autoencoder(tf.keras.Model):
  def __init__(self,latent_dim,input_dim):
        super(Autoencoder32x, self).__init__()
        self.latent_dim = latent_dim
        self.input_dim = input_dim
        self.dropout_factor = 0.5
        self.encoder = Sequential([
                                  # Dense(16, activation='relu', input_shape=(self.input_dim,)),
                                   #Dropout(self.dropout_factor),
                                   Dense(8, activation='relu'),
                                   Dropout(self.dropout_factor),
                                   Dense(4, activation='relu'),
                                   Dropout(self.dropout_factor),
                                   Dense(self.latent_dim, activation='relu')
                                   ])
        self.decoder = Sequential([
                                   Dense(4, activation='relu', input_shape=(self.latent_dim,)),
                                   Dropout(self.dropout_factor),
                                   Dense(8, activation='relu'),
                                   Dropout(self.dropout_factor),
                                   #Dense(16, activation='relu'),
                                   #Dropout(self.dropout_factor),
                                   Dense(self.input_dim, activation=None)
                                   ])
  
  def call(self, inputs):
    encoder_out = self.encoder(inputs)
    return self.decoder(encoder_out)

Model compilation

ae_train_x, ae_test_x, ae_train_y, ae_test_y = train_test_split(scaled_df[COLUNMS_FOR_AUTOENCODER], scaled_df[COLUNMS_FOR_AUTOENCODER], test_size=0.33)
autoencoder = Autoencoder(latent_dim=2,input_dim=9)
autoencoder.compile(loss='mse', optimizer='adam',metrics=['accuracy'])

Finally training

ae_history = autoencoder_10_32x.fit(ae_train_x, ae_train_y, validation_data=(ae_test_x, ae_test_y), epochs=50)

Output of training

Epoch 1/50
22255/22255 [==============================] - 38s 2ms/step - loss: 0.3330 - accuracy: 0.9646 - val_loss: 0.2816 - val_accuracy: 0.9999
Epoch 2/50
22255/22255 [==============================] - 38s 2ms/step - loss: 0.2664 - accuracy: 0.9999 - val_loss: 0.2818 - val_accuracy: 0.9999
Epoch 3/50
22255/22255 [==============================] - 38s 2ms/step - loss: 0.2649 - accuracy: 0.9999 - val_loss: 0.2845 - val_accuracy: 0.9999

What could be the problem? I think the network is learning to just pass the values. But that should not be possible with the bottleneck and dropout layers. I have also decreased layers but still the result is same. How can I fix it?

Topic sparsity keras autoencoder deep-learning accuracy

Category Data Science


The issue turned out to be MinMaxScaler. The issue was sorted after using standardization instead of normalization. (I am not sure as I did not completely verify but best possible guess as compressed output was mostly mapping one feature) One of the feature X had much higher values after normalization then the other features (probably due to outliers in other features). So the the feature X pretty much mostly contributed to the Accuracy.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.