Can the use of EarlyStopping() offset overfitting problems caused by validation_split?

Question

Can the use of EarlyStopping() offset overfitting problems caused by validation_split?

namiyousef

2020年11月25日 15:36

Keras gives users the option, while fitting a model, to split the data into train/test samples using the parameter validation_split.

Example:

model = Sequential()
model.add(Dense(3, activation = 'relu'))

/// Compile model ///

model.fit(X_train, y_train, validation_split = 0.2)

However, my intuition suggests that using validation_split (as opposed to creating train, test samples before fitting the model) will cause overfitting, since although validation_split splits the batches into train and test at each epoch, the overall effect is that the entire dataset is 'seen' by the model.

I was wondering if:

my intuition is correct
assuming that 1) is true, if there are any circumstances where using the EarlyStopping() callback and validation_split would be better than splitting the data into train/test before fitting the model

Topic early-stopping keras cross-validation

Category Data Science

namiyousef · Accepted Answer · 2020年11月25日 15:36

The validation split parameter splits the data fed into .fit() into train and test sets. There is no mixing of the train and test sets after each epoch. So in terms of splitting, it behaves the same as sklearn's train_test_split(), the only difference being that by default, keras splits the data by index (so if you have validation_split = 0.2, the first 80% of indices are taken for training, the rest for testing).

So, in principle, it should not cause overfitting.

However, many times overfitting can occur based on how the model is evaluated. What I've seen many people do is the following:

# use previously created model

model.fit(X, y, validation_split = 0.2)
model.evaluate(X, y)

Here, the model would be overfitting because keras doesn't 'remember' the split that it used to train the data.

If you are using validation_split for visualisation purposes, an alternative would be to do the following:

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2)
model.fit(X_train, y_train, validation_data = (X_test, y_test), callbacks = [callback])
model.evaluate(X_test, y_test)

This way, during training you can still get the test curves while the model is training, but at the end when using evaluate (or predict), you'll still be predicting data that is previously unseen from a training POV.

As for EarlyStopping(), it can be used here the same way it would be used with validation_split.

Can the use of EarlyStopping() offset overfitting problems caused by validation_split?

About