Gradient and loss calculation localization in Vision Transformers

Hi all I am resorting to you to figure out where the gradient and the loss for q,k,v weights update happens in Vision Transformers. I suspect it is the MLP/FF bit of the architecture but I am not confidently sure. I attach some code from lucidrains import torch from torch import nn from einops import rearrange, repeat from einops.layers.torch import Rearrange # helpers def pair(t): return t if isinstance(t, tuple) else (t, t) # classes class PreNorm(nn.Module): def __init__(self, dim, …
Category: Data Science

Regression problem with Deep Learning

I'm working on the Housing Price dataset, where the target is to predict the housing price. The price of the house will always be positive and according to me, it's possible that the model can predict a negative outcome for some of the samples. If it's correct, is there any way to control the training such that the model always predicts at least the positive value. As in the case of the classification case we use the Sigmoid/Softmax activation function …
Category: Data Science

Keras loss object and shapes

I'm at a loss. I've been staring at this problem for a while and I'm unsure how to proceed. I've been constructing a script to train a model for object detection based on a dataset I've compiled. I've been going along with some example scripts and modifying some code. Here is my code: import os from tempfile import gettempdir import tensorflow as tf from tensorflow.keras import layers, Model, Sequential import numpy as np from clearml import Task, Dataset, TaskTypes def …
Category: Data Science

HuggingFace Transformers is giving loss: nan - accuracy: 0.0000e+00

I am a HuggingFace Newbie and I am fine-tuning a BERT model (distilbert-base-cased) using the Transformers library but the training loss is not going down, instead I am getting loss: nan - accuracy: 0.0000e+00. My code is largely per the boiler plate on the [HuggingFace course][1]:- model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=3) opt = Adam(learning_rate=lr_scheduler) model.compile(optimizer=opt, loss=loss, metrics=['accuracy']) model.fit( encoded_train.data, np.array(y_train), validation_data=(encoded_val.data, np.array(y_val)), batch_size=8, epochs=3 ) Where my loss function is:- loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) The learning rate is calculated like so:- lr_scheduler …
Category: Data Science

Why is my validation loss never INcreasing?

I am currently training different neural networks for the binary classification of images. When using the logistic regression, my validation loss never increases, even not after 5000 epochs. I thought that at some point overfitting happens and the validation loss always increases. Does anybody know why this does not happen?
Category: Data Science

Loss stuck for regression model

I'm training a model that returns 2 parameters. These two parameters are used for classical image processing: a threshold for the kirsch-operator the number of iterations for billateral filter. The model trains using 300 representative images, along with both parameters that were manually determined. I am currently using resnet18. A convolutional regression model. The fully connected layer is changed to output 2 nodes. As loss function I've chosen is the mean squared loss. Reducelronplateau is used as a learning rate …
Category: Data Science

Interpreting Categorical Crossentropy Loss

I would like to ask for clarification about the loss values outputted during training using Categorical Crossentropy as the loss function. If I have 11 categories, and my loss is (for the sake of the argument) 2, does this mean that my model is on average 2 categories off the correct category, or is the loss used purely for comparative purposes and cannot be interpreted like I am suggesting ?
Category: Data Science

High loss but low rmse, how?

I have trained an lstm model on a dataset but its loss during training is ten times than the rmse during test. How is it possible, and can I use this model if rmse is very low but loss is high? How can I improve training and test loss?
Category: Data Science

Should the model be defined again before training it to new data?

I wanted to fit the LSTM model on new data set in a loop so I have implemented it like this #................................define model........................... model =Sequential() model.add(LSTM(100, activation='relu', input_shape=(n_input,n_features))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') model.summary() for k, v in enumerate(nse.get_fno_lot_sizes()): if v not in ('^NSEI','NIFTYMIDCAP150.NS','NIFTY_FIN_SERVICE.NS','^NSEBANK'): #-----------Create Training-------------------- train = df[['close']].iloc[:int(len(df)*0.8)] scaler = MinMaxScaler() scaler.fit(train) scaled_train = scaler.transform(train) #------------------------------------------------------ generator = TimeseriesGenerator(scaled_train,scaled_train,length=n_input, batch_size=1) #----------------------------------------------------- #fit model model.fit(generator,epochs=10) or should the model definition be inside the for loop? I am asking this because I do …
Category: Data Science

Custom loss function for regression

I am trying to write a custom loss function for a machine learning regression task. What I want to accomplish is following: Reward higher preds, higher targets Punish higher preds, lower targets Ignore lower preds, lower targets Ignore lower preds, higher targets All ideas are welcome, pseudo code or python code works good for me. This is what I tried so far, it does not work so well I think it is because it does not take high targets into …
Category: Data Science

Logarithmic scale for a learning curve

I'm plotting the learning curve with Python with the following code: import matplotlib.pyplot as plt import seaborn as sns import csv import pandas as pd sns.set(style='darkgrid') # Increase the plot size and font size. sns.set(font_scale=1.5) plt.rcParams["figure.figsize"] = (12,6) plt.plot(lst, 'r') plt.legend(["Validation Loss"]) # Label the plot. plt.title("RNN deltat") plt.xlabel("Epoch") plt.ylabel("Loss") The curve looks like this: The lecturer said better try it on a logarithmic scale. Can you please help to apply the logarithm here?
Category: Data Science

Decreasing Learning Rate doesn't improve the results

In theory and what people are doing (e.g. Paper) decreasing the learning rate should help the optimizer to go "deeper into the valley" and thus decrease the loss and increase the metric. Thus, my plan was to train a neural network with a learning rate of 1 until the loss and my metric stay approx. the same for some epochs, then with 0.1, then 0.01 and so on. However, what I'm observing is, that the loss of the model stagnates …
Category: Data Science

The case of (1,478) dim and parameters of neural network to find out

colleagues, actually I am kind'a new to NN, but hard trying.. I have data: Index: 40073 entries (excluded from training, UID) Columns: 484 entries dtypes: bool(468), float64(2), int64(13), object(1) I used only 478 arguments. The Y is moneySpend which can be >= 0 The code is below: newDropped = df.drop(["moneySpend","userAgent", "secondsToBuy", "hoursToBuy", "daysToBuy", "platform"], axis = 1) x_train, x_test, y_train, y_test = train_test_split(newDropped, df["moneySpend"], test_size=0.25, random_state=547) model = Sequential() dnn1.add(Dense(16, input_dim=478, activation='relu')) dnn1.add(Dense(8, activation='relu')) dnn1.add(Dense(1, activation='linear')) model.compile(loss='mse', optimizer='adam', metrics=['accuracy']) tb_callback …
Category: Data Science

NeMo Conformer-CTC Predicts Same Word Repeatedly When Fine-Tuning

I'm using the NeMo Conformer-CTC small on the LibriSpeech dataset (the clean subset, around 29K inputs, using 90% for training and 10% for testing). I use Pytorch Lightning. When I try to train, the model learns 1 or 2 sentences in 50 epochs and gets stuck at a loss of 60-something (I trained it for 200 epochs too and it didn't budge). But when I try to fine tune it using a pre-trained model from the toolkit, it predicts correctly …
Category: Data Science

Training Loss increases, but Validation Loss decreases

I am finetuning a T5 transformer model on a sequence to sequence task. My program outputs the training and validation loss every 500 optimization steps. However, when I first started training the model, the training loss steeply increased, but my validation loss decreased (My training dataset has about 85,000 samples and my validation dataset has about 10,000 samples)! Does anyone know why this might be happening? Is this a sign my model is not learning properly? Also, does anyone know …
Category: Data Science

why the accuracy result and the loss result of an ANN model is inconsistent?

I trained a model based on an ANN and the accuracy is 94.65% almost every time while the loss result is 12.06%. Now my question is shouldn't the loss of the model be (100-94 = 6%) or near it? Why it is giving a result of 12% when the accuracy is 94%? • ANN model specification: Trained and tested data= 96,465 (training data = 80%, testing data = 20%) 1 Input layer= 5 nodes, 2 Hidden layers= 24 nodes each, …
Category: Data Science

Q values loss per episode and mean absolute error

I am new to deep reinforcement learning! I am following this code for my adaptation problem (doing actions) https://github.com/jaromiru/AI-blog/blob/master/CartPole-DQN.py I am wondering how I can evaluate the training, I already got the average rewards, but how can I get the average Q values, loss per episode, and average absolute error. to evaluate my agent please! I will be grateful if you can help me!
Category: Data Science

Training loss = 0, training accuracy =1, validation and test around 85%

I have created different CNNs for doing image classification. The dataset is this: https://www.kaggle.com/crowww/a-large-scale-fish-dataset There are 9 classes, and each class contains 1000 images of fish. I split in training (800 imgs per class), validation (100) and test (100). I created different CNN with these layers: 1)1 convolutional layers (conv, relu, batchnorm) + 2 fully connected layers + output 2)2 convolutional layers (conv, relu, batchnorm and maxpooling) + 2 fully connected layers + output 3)4 convolutional layers (conv, relu, batchnorm …
Category: Data Science

How to calculate MAE and threshold in a multivariate time series

I'm trying to understand how to calculate the MAE in my time series and then the thresholds to understand which of my data in the test set are anomalies. I'm following this tutorial, which is based on a univariate time series, and they calculate it in the following way: # Get train MAE loss. x_train_pred = model.predict(x_train) train_mae_loss = np.mean(np.abs(x_train_pred - x_train), axis=1) I have a dataset structured as well: device1 device2 device3 .... device30 0.20 0.35 0.12 0.56 1.20 …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.