Loss drops to NaN after a short time for a time series classification

here is my model code for a binary classification of a time series:

def make_model(feature_columns):
  feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
  feature_layer_outputs = feature_layer(feature_layer_inputs)
  feature_layer_outputs = tf.expand_dims(feature_layer_outputs, 1)

  conv = keras.layers.Conv1D(filters=64, kernel_size=3, padding=same,kernel_regularizer=keras.regularizers.l1_l2(l1=0.01, l2=0.01))(feature_layer_outputs)
  conv = keras.layers.BatchNormalization()(conv)
  conv = keras.layers.ReLU()(conv)

  conv = keras.layers.Conv1D(filters=64, kernel_size=3, padding=same,kernel_regularizer=keras.regularizers.l1_l2(l1=0.01, l2=0.01))(conv)
  conv = keras.layers.BatchNormalization()(conv)
  conv = keras.layers.ReLU()(conv)

  conv = keras.layers.Conv1D(filters=64, kernel_size=3, padding=same,kernel_regularizer=keras.regularizers.l1_l2(l1=0.01, l2=0.01))(conv)
  conv = keras.layers.BatchNormalization()(conv)
  conv = keras.layers.ReLU()(conv)
  conv = keras.layers.Dropout(0.25)(conv)

  gap = keras.layers.GlobalAveragePooling1D()(conv)

  output_layer = keras.layers.Dense(1, activation=Softmax)(gap)

  return keras.models.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=output_layer)`

So i tried the following. First i had from the beginning nan as loss. I fixed that by using the RobustScaler on numeric values

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, RobustScaler,MinMaxScaler
dataframe = df
ct = ColumnTransformer([
        ('numeric', RobustScaler(), numerical_features[1:])
    ], remainder='passthrough')

dataframe = ct.fit_transform(dataframe)

numerical_features[0] is my label. I also tried to add keras.regularizers.l1_l2 to fix the loss nan problem and to add dropout after each relu layer (now its only on the last). I also tried to use different losses like MSE and binary crossentropy also i tried using sigmoid/softmax in the outputlayer. Im pretty much clueless at this point and i have the feeling im missing something which is rather basic. Does someone has any idea?

Topic binary-classification keras loss-function classification time-series

Category Data Science


In my experience the most common cause for NaN loss is when a validation batch contains 0 instances. It's possible that you have some calculation based for example on averaging loss over several time stamps, but one of the time stamps has 0 instances causing a cascade of NaN values.

Check carefully your validation set and how the loss is calculated on it.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.