Appropriate loss function and metrics for regression task with mixed outputs

Question

Appropriate loss function and metrics for regression task with mixed outputs

SagRU

2021年12月10日 06:44

I'm trying to train an EfficientNet-based Keras model that takes an image as input and returns two numeric values as output.

Here's the model:

def prepare_model_eff(input_shape):
    inputs = Input(shape=input_shape)
    x = EfficientNetB3(include_top=False, input_shape=input_shape)(inputs)
    x.trainable = True
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dropout(rate=0.1, )(x)
    x = layers.BatchNormalization()(x)
    out_1 = layers.Dense(1, activation='linear', name='out_1')(x)
    out_2 = layers.Dense(1, activation='linear', name='out_2')(x)
    model = Model(inputs=inputs, outputs=[out_1, out_2])

As far as I know, the most common metric for such tasks is Root Mean Square Error (RMSE):

def root_mean_squared_error(y_true, y_pred):
    return K.sqrt(K.mean(K.square(y_pred - y_true)))

I also use Mean Squared Error as a loss function:

model.compile(
    loss={'out_1': 'mean_squared_error', 'out_2': 'mean_squared_error'}, 
    optimizer=Adam(lr=0.001),
    metrics={'out_1': rmse, 'out_2': rmse}, 
)

The problem is my output values is quite unusual in several ways:

I have two numeric values (neurons) as output, not a single one
Both output values are pretty small numbers (e.g. 0.0003)
Output values could be both positive (0.0011, 0.0009), both negative (-0.0005, -0.0008) or mixed (0.0001, -0.0004)

From mathematical point of view RMSE metric and MSE loss seem fit well for my task with all these nuances. However, current model performance is unsatisfactory even with 20K of training samples.

So, my questions are:

Is RMSE an appropriate metric in my case?
Is MSE an appropriate loss function in my case?
Is it expedient to calculate loss and metric separately for every output like I'm doing?
Is it advisable to add mean output values as constant bias initializers for every output dense layer to help the model to converge faster?
Does learning rate depend on a scale of an output? In my case outputs are small numbers, so I wonder if my learning rate should adopt to it and go, say, below 1e-7.

Any answers / comments are appreciated!

Topic rmse mse keras regression

Category Data Science

Appropriate loss function and metrics for regression task with mixed outputs

About