Discouraging values or smoothing out results when model fitting

I'm working on training a network to do direction of arrival prediction and I'm having the issue that no matter what my network is (ResNet 18 - 101, CRNN, CNN, etc...) my results tend toward one small range of values as seen in the image below

which leads obviously to the following errors:

I have attempted to just wait it out until my network finally learns, but my validation loss diverges pretty much immediately. An example can be seen below.

The strange thing is that even my training loss doesn't go to 0. I would think if my network would overfit, it would simply learn my dataset perfectly, but that doesn't happen, regardless of how complex I make my model. The only thing I can think of is my feature representation is completely nonsensical or I've got a typo somewhere in my training function and something bizarre is happening.

I've tried messing around with the loss function, tried different activation functions at the end of the network such as Tanh, sigmoid, ReLU and no activation function at all. At the moment I've simplified my training data as much as possible and am working with an 8-channel 1s long Chirp signal which can be found (at least temporarily) here: https://file.re/2021/06/20/chirp/

Like mentioned above, I've tried a standard ResNet of all sizes, and various different feature representations, with the most recent being taking the complex STFT of all 8 channels, stacking the magnitudes vertically and adding the angle information to the X-axis, as seen below:

If anybody has any ideas, I'm more than happy to try them. At the moment I'm using a CNN with vertical convolutions and no pooling operations in an attempt to conserve the time information.

My main training method can be seen below:

def train(self):
        steps, losses, metrics = TrainingUtilities.get_training_variables(self.parameters)
        patience_counter = 0
        best_epoch_data = None
        best_epoch_validation_loss = 999
        best_epoch = 0
        exit_training = False
        try:
            for epoch in range(steps, self.epoch_count):
                epoch_metrics = TrainingUtilities.initialize_metrics(self.mode)
                if exit_training:
                    break
                for idx, phase in enumerate(['train', 'val']):
                    if phase == 'train':
                        self.model.train()
                    else:
                        self.model.eval()
                    for _, data in tqdm(enumerate(self.data_loaders[idx])):

                        self.optimizer.zero_grad()
                        inputs, labels = data

                        outputs = self.model(inputs.to(self.device))
                        # labels = azi_class.squeeze_().to(self.device)
                        loss = self.criterion(outputs.squeeze(), labels.to(self.device))

                        epoch_metrics = TrainingUtilities.get_epoch_metrics(
                            outputs, labels, loss, epoch_metrics, phase, self.mode)

                        if phase == 'train':
                            loss.backward()
                            self.optimizer.step()

                    TrainingUtilities.report_metrics(self.writer, epoch_metrics, epoch, phase, self.parameters, self.mode)
                    if phase == val:
                        TrainingUtilities.step_scheduler(
                            self.scheduler, np.mean(epoch_metrics[0][phase]), self.parameters)
                  
                losses.append(epoch_metrics[0])
                metrics.append(epoch_metrics)
                if epoch % self.epoch_save_count == 0:
                    TrainingUtilities.save_checkpoint(self.model, losses, metrics, self.training_dir, epoch, self.mode, self.model_name, self.size)

                steps += 1

        except (KeyboardInterrupt, RuntimeError) as error:
            print(fError: {error})
            TrainingUtilities.save_checkpoint(self.model, losses, metrics, self.training_dir, steps, self.mode, self.model_name, self.size)

Its very possible that i've just missed something but the main idea is , I think quite simple. I've abstracted a lot of things due to having tested a lot of different models with different datasets as well as wanting to easily switch between classification and regression.

Topic audio-recognition regression deep-learning

Category Data Science


I can't access the file. If you can't share it, could you share a small sample?

Some ideas:

1- Did you try to normalize your data? Some data with unusual values needs to be normalized in order to be trained correctly.

2- There could be also an issue in the training, maybe an overfitting. Did you apply a dropout to increase the generalization.

3- If there is the same error with any model you use, maybe there is something wrong in your learning process: maybe an initial data transformation that you don't apply on the prediction data.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.