Do I need to rescale input labels before training (label values between 20..51)?

I'm trying to build model for this datatset (Age prediction): The input image has the shape: 3, 128, 128 and the predicted labels (ages) range between 20 to 51. I want to build model and train it with MSE and R2 metrics. I built the following model: def GetPretrainedModel(): oModel = torchvision.models.resnet50(pretrained=True) for mParam in oModel.parameters(): if False == isinstance(mParam, nn.BatchNorm2d): mParam.requires_grad = False dIn = oModel.fc.in_features oModel.fc = nn.Sequential( nn.Linear(dIn, 512), nn.ReLU(), nn.Linear(512, 256), nn.ReLU(), nn.Linear(256, 128), nn.ReLU(), nn.Linear(128, …
Trying to implement a loss function read from a journal-article in python

Computer science undergrad here. I am trying to understand Eqn 12 from this paper so that I can implement it in python code. In this paper, the NN model takes a blurred image as input and outputs a sharp (deblurred) image and the kernel that can produce the same blurred image after multiplying with the sharp image. Here - $\widetilde{K_t}$ = kernel predicted matrix $K_t^{train}$ = ground truth kernel (for training) matrix $\widetilde{X_t}$ = predicted sharp image matrix $X_t^{train}$ = …
how to calculate loss function?

i hope you are doing well , i want to ask a question regarding loss function in a neural network i know that the loss function is calculated for each data point in the training set , and then the backpropagation is done depending on if we are using batch gradient descent (backpropagation is done after all the data points are passed) , mini-batch gradient descent(backpropagation is done after batch) or stochastic gradient descent(backpropagation is done after each data point). …
Match between objective function and evaluation metric

Does the objective function for model fitting and the evaluation metric for model validation need to be identical throughout the hyperparameter search process? For example, can a XGBoost model be fitted with the Mean Squares Error (MSE) as the objective function (setting the 'objective' argument to reg:squarederror: regression with squared loss), while the cross validation process is evaluated based on a significantly different metric such as the gamma-deviance (residual deviance for gamma regression)? Or should the evaluation metric match the …
Batch Size influences R2 score a lot, but not MSE (much)

If I train a model following a random search, (and in general for this problem I am working on), a big batch size seems to control R2 score where bs=200 or more, say, roughly, gives R2 scores of 0.95 or above and an MSE or about 0.012. If I lower the batch size, MSE may decrease a little faster (I think) except that R2 score blows up. (to minus -5692.7026, say and thereabouts). E.g. 97256/100664 [===========================>..] - ETA: 6s - …
What causes explosion in MSE when training?

I (probably) well overfitted/overtrained a model. But I was just curious as to what might cause this type of behaviour. I carried on training (Epoch 1/50 is not the first epoch of training this model). You can see the mse (loss) is v low. It slowly decreases over epochs 1-40. Then soon it explodes. What causes this type of behaviour when training models? 55706/55706 [=======] - 109s 2ms/step - loss: 0.0059 - coeff_determination: 0.9688 … Epoch 5/50 55706/55706 [=======] - …
weighted mse - weights as function of time

I am predicting timeseries data using LSTM (in tensorflow). Currently I am using MSE as my metric of choice. I would like to create my own custom Weighted MSE metric, such that the weights are a decreasing function of the index, that it to put more weight on earlier time steps (earlier prediction will be better). To elaborate on my problem definition : I am trying to predict $y_1, .. y_n$ and would like to take into account $n$. My …
Increasing (negative) R2 coincident with decreasing (positive) MSE during hyper parameter optimisation

I have a densely connected NN and I'm running a hyper parameter optimisation for multi-target output. During hyper parameter optimisation training, each epoch KerasTuner focuses on val_loss. During training I can see that I have absurdly large negative R2 values (basically a terribly fitted model), that decrease to 0 (and hopefully continue to 1) mostly whilst MSE drops too. Occasionally I'll get extremely large (negative) jumps back up in the R2_val score, whilst all other metrics decrease. (including R2_train score) …
How to extract MSEP or RMSEP from lassoCV?

I'm doing lasso and ridge regression in R with the package chemometrics. With ridgeCV it is easy to extract the SEP and MSEP values by modell.ridge$RMSEP and model.ridge$SEP. But how can I do this with lassoCV? model.lasso$SEP works, but there is no RMSE or MSE entry in the list. However the function provides a plot with MSEP and SEP in the legend. Therefore it must be possible to extract both values! But how? SEP = standard error of the predictions; …
Finding a vector that minimize the MSE of its linear combination

I have been doing a COVID-19 related project. Here is the question: N = vector of daily new infected cases D = vector of daily deaths E[D] = estimation of daily deaths N is a n-dimensional vector, n is around 60. E[D] is another n-dimensional vector. Under certain assumptions, each entry of E[D] can be calculated as a linear combination of the entries of N. We want to find the vector N such that the E[D] derived from N has …
Regression performance with Feature Selection

I would like to ask you a theoretical question. In my project I am trying to get a better performance from my regression model by feature selection methods, especially with CatBoost feature importances. I would like to ask: 1- I know the term "Garbage in Garbage out", so more features do not always mean better performance; moreover it decreases the performance. But can we get a better evaluation score like MSE, RMSE by eliminating less important features from the model? …
Math behind, MSE = bias^2 + variance

Based on the deeplearningbook: $$MSE = E[(\theta_m^{-} - \theta)^2]$$ $$equals$$ $$Bias(\theta_m^{-})^2 + Var(\theta_m^{-})$$ where m is the number of samples in training set, $\theta$ is the actual parameter in the training set and $\theta_m^{-}$ is the estimated parameter. I can't get to the second equation. Further, I don't understand how the first expression is gained. Note: $Bias(\theta_m^{-})^2 = E(\theta_m^{-2}) - \theta^2$ Also how bias and variance evaluated in classification.?
Can't understand an MSE loss function in a paper

I'm reading a paper published in nips 2021. There's a part in it that is confusing: This loss term is the mean squared error of the normalized feature vectors and can be written as what follows: Where $\left\|.\right\| _2$is $\ell_2$ normalization,$\langle , \rangle$ is the dot product operation. As far as I know MSE loss function looks like : $L=\frac{1}{2}(y - \hat{y})^{2}$ How does the above equation qualify as an MSE loss function?
I am getting very minimal mse values and not sure if it is correct?

Below is the linear regression model I fitted and not sure if I am doing the right way as I am getting neat to 99% accuracy Fitting Simple Linear Regression to the Training set from sklearn.linear_model import LinearRegression from sklearn.model_selection import cross_val_score ln_regressor = LinearRegression() mse = cross_val_score(ln_regressor, X_train, Y_train , scoring = 'neg_mean_squared_error', cv = 5) mean_mse = np.mean(mse) print(mean_mse), Y_train) ** MSE SCORE =-6.612466691367042e-06** Predicting the Test set results y_pred = ln_regressor.predict(X_test) Evaluating accuracy of test data …
Appropriate loss function and metrics for regression task with mixed outputs

I'm trying to train an EfficientNet-based Keras model that takes an image as input and returns two numeric values as output. Here's the model: def prepare_model_eff(input_shape): inputs = Input(shape=input_shape) x = EfficientNetB3(include_top=False, input_shape=input_shape)(inputs) x.trainable = True x = layers.GlobalAveragePooling2D()(x) x = layers.Dropout(rate=0.1, )(x) x = layers.BatchNormalization()(x) out_1 = layers.Dense(1, activation='linear', name='out_1')(x) out_2 = layers.Dense(1, activation='linear', name='out_2')(x) model = Model(inputs=inputs, outputs=[out_1, out_2]) As far as I know, the most common metric for such tasks is Root Mean Square Error (RMSE): def …
Find $a, b, c$ minimizing MSE

Suppose you are given a "dummy" classifier. It looks like this: $$ y(x) = \begin{cases} a \text{ if } x >= c \\ b \text{ else } \end{cases} $$ Given some data set $\{(y_1, x_1), \dots (y_n, x_n)\}$ how to estimate $a, b, c$ such that the MSE would be minimal?
Keras Custom loss Penalize more when actual and prediction are on opposite sides of Zero

I'm training a model to predict percentage change in prices. Both MSE and RMSE are giving me up to 99% accuracy but when I check how often both actual and prediction are pointing in the same direction ((actual >0 and pred > 0) or (actual < 0 and pred < 0)), I get about 49%. Please how do I define a custom loss that penalizes opposite directions very heavily. I'd also like to add a slight penalty for when the …
Is there a quicker solution to Sklearn MAE?

I am attempting to run RandomForestRegressor on this fairly large dataset: df_train.describe(): Unnamed: 0 col1 col2 col3 col4 col5 count 8.886500e+05 888650.000000 888650.000000 888650.000000 888650.000000 888650.000000 mean 5.130409e+05 2.636784 3.845549 4.105381 1.554918 1.221922 std 2.998785e+05 2.296243 1.366518 3.285802 1.375791 1.233717 min 4.000000e+00 1.010000 1.010000 1.010000 0.000000 0.000000 25% 2.484332e+05 1.660000 3.230000 2.390000 1.000000 0.000000 50% 5.233705e+05 2.110000 3.480000 3.210000 1.000000 1.000000 75% 7.692788e+05 2.740000 3.950000 4.670000 2.000000 2.000000 max 1.097490e+06 90.580000 43.420000 99.250000 22.000000 24.000000 df_test.describe(): Unnamed: 0 col1 col2 …
Combine several performance metrics from several datasets

We are developing and evaluating a multi knee/elbow point detection algorithm. For our evaluation, we have 200 sequences of real data. These sequences were annotated manually. For each algorithm and sequence, we computed four different performance metrics: two variations of MSE and two custom cost functions. The question is how can we combine the results in a summary to identify the overall best performing model? Our solution right now is using two simple counting/voting systems The first is binary, the …
