Why does using tanh worsen accuracy so much?

I was testing how different hyperparameters would change the output of my multilayer perceptron for a regression problem

checkpoint = keras.callbacks.ModelCheckpoint(best_model.h5, save_best_only=True) 
# Initialising the ANN
model = Sequential()

# Adding the input layer and the first hidden layer
model.add(Dense(32, activation = 'relu', input_dim = X_train.shape[1]))

# Adding the second hidden layer
model.add(Dense(units = 8, activation = 'relu'))

# Adding the output layer
model.add(Dense(units = 1))

optimizer = keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=optimizer, loss='mean_squared_error')
# Fitting the ANN to the Training set
history = model.fit(X_train, y_train, batch_size = 100, epochs = 20, verbose=1, validation_split = 0.1, callbacks=[checkpoint])

and this model produced around 68% accuracy.

But when the activation functions for the hidden layers were changed to 'tanh', the accuracy jumped off a cliff to 0.07%!

I'm guessing it is something to do with tanh not being suited to regression?

Topic activation-function keras neural-network

Category Data Science


As posed, this question is quite difficult to answer as we do not have the data. In any case, the first thing I can say is that $\mathrm{tanh}$ is an asymptotic function, hence there are large areas of the parameter space where its derivative is ~$0$, meaning the model struggles to learn.

This turns out to produce models less robust.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.