Appropriate loss function and metrics for regression task with mixed outputs

I'm trying to train an EfficientNet-based Keras model that takes an image as input and returns two numeric values as output.

Here's the model:

def prepare_model_eff(input_shape):
    inputs = Input(shape=input_shape)
    x = EfficientNetB3(include_top=False, input_shape=input_shape)(inputs)
    x.trainable = True
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dropout(rate=0.1, )(x)
    x = layers.BatchNormalization()(x)
    out_1 = layers.Dense(1, activation='linear', name='out_1')(x)
    out_2 = layers.Dense(1, activation='linear', name='out_2')(x)
    model = Model(inputs=inputs, outputs=[out_1, out_2])

As far as I know, the most common metric for such tasks is Root Mean Square Error (RMSE):

def root_mean_squared_error(y_true, y_pred):
    return K.sqrt(K.mean(K.square(y_pred - y_true)))

I also use Mean Squared Error as a loss function:

model.compile(
    loss={'out_1': 'mean_squared_error', 'out_2': 'mean_squared_error'}, 
    optimizer=Adam(lr=0.001),
    metrics={'out_1': rmse, 'out_2': rmse}, 
)

The problem is my output values is quite unusual in several ways:

  • I have two numeric values (neurons) as output, not a single one
  • Both output values are pretty small numbers (e.g. 0.0003)
  • Output values could be both positive (0.0011, 0.0009), both negative (-0.0005, -0.0008) or mixed (0.0001, -0.0004)

From mathematical point of view RMSE metric and MSE loss seem fit well for my task with all these nuances. However, current model performance is unsatisfactory even with 20K of training samples.

So, my questions are:

  • Is RMSE an appropriate metric in my case?
  • Is MSE an appropriate loss function in my case?
  • Is it expedient to calculate loss and metric separately for every output like I'm doing?
  • Is it advisable to add mean output values as constant bias initializers for every output dense layer to help the model to converge faster?
  • Does learning rate depend on a scale of an output? In my case outputs are small numbers, so I wonder if my learning rate should adopt to it and go, say, below 1e-7.

Any answers / comments are appreciated!

Topic rmse mse keras regression

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.