There's this dataset containing the metadata of Twitch's top 1,000 streamers of 2020. You can have the details here. I am currently participating in a challenge to predict the values for Followers gained, by creating and training the model using the remaining features from the dataset. The kernel objective is to get the lowest RMSE (Root-Mean Squared Error) metric value from the model's predictions. Until now, I have made numerous attempts to lower down the RMSE loss value as much …
SI is RMSE divided by the average value of the observed values (or the predicted values? am confused)? is SI = 25% acceptable? (is the model good enough? )
I am doing linear regression using the Boston Housing data set, and the effect of applying $\log(y)$ has a huge impact on the MSE. Failing to do it gives MSE=34.94 while if $y$ is transformed, it gives 0.05.
I am estimating water depth with satellite data (predicted value) and would like to validate my result using bathymetry lidar data collected on the field and believed to be more accurate (observed value). I have different observations at each water depth. For example, number of observations at water depth range of 0-10 m are 300, where as values at deeper depth range (10 - 20 m) are less (~50 points). I have been using RMSE (as I would like to …
My goal is to develop a model that predicts next customer purchases in USD (Update: During the time period of the dataset, if no purchase was made by the customer, the next purchase label is set to zero). I am trying to determine what would be the most effective metric for measuring the model's performance. Results looks like so: y_true_usd y_predicted_usd 1.2 0.8 0 0.3 0 1.1 0 0 0 0.1 5.3 4.3 First I thought about going with RMSE, …
I have a model with 7 features, I'm trying to figure out if I can improve the performance of this model by adding additional features. So I'm relying on the RMSE to measure the accuracy of my predictions. from 7 features I get to 25 features and with each time I add a new feature, the RMSE slightly gradually get better (smaller). I find it hard to believe that all of these features improved the performance of my model as …
I have trained an lstm model on a dataset but its loss during training is ten times than the rmse during test. How is it possible, and can I use this model if rmse is very low but loss is high? How can I improve training and test loss?
Context: I'm currently crafting and comparing machine learning models to predict housing data. I have around 32000 data points, 42 features, and I'm predicting housing price. I'm comparing Random Forest Regressor, Decision Tree Regressor, and Linear Regression. I can tell there is some overfitting going on, as my initial values vs cross validated values are as follows: RF: 10 Fold R Squared = 0.758, neg RMSE = -540.2 vs unvalidated R Squared of 0.877, RMSE of 505.6 DT: 10 Fold …
Suppose I made a model which has rmse of 50 Now when I predict the next data which is 500 So does that mean the actual value has high probability to be within the range of 450 - 550 ? If so what is the probability that it will be in this range? Or it means the actual value has high probability to be within the range of 475- 525 ? If so what is the probability that it will …
I'm trying to train an EfficientNet-based Keras model that takes an image as input and returns two numeric values as output. Here's the model: def prepare_model_eff(input_shape): inputs = Input(shape=input_shape) x = EfficientNetB3(include_top=False, input_shape=input_shape)(inputs) x.trainable = True x = layers.GlobalAveragePooling2D()(x) x = layers.Dropout(rate=0.1, )(x) x = layers.BatchNormalization()(x) out_1 = layers.Dense(1, activation='linear', name='out_1')(x) out_2 = layers.Dense(1, activation='linear', name='out_2')(x) model = Model(inputs=inputs, outputs=[out_1, out_2]) As far as I know, the most common metric for such tasks is Root Mean Square Error (RMSE): def …
I am trying to determine which model result is better. Both results are trying to achieve the same objective, the only difference is the exact data that is being used. I used random forest, xgboost, and elastic net for regression. Here is one of the results that has low rmse but not so good r2 model n_rows_test n_rows_train r2 rmse rf 128144 384429 0.258415240861579 8.44255341472637 xgb 128144 384429 0.103772500839367 9.28116624462333 e-net 128144 384429 0.062460300392487 9.49266713837073 The other model run has …
So Im studying machine learning through R, and Im working with the boston data set from the library MASS. I am practicing bootsrapping. I already carried out analysis to determine how ,many distinct data points on average are drawn from the sample to make up a bootsratp resample, using B=100 resamples of the dataset. Next I would like to do two things- perform boostrapping of an ordinary linear regression model using B=100 resamples of the data set again and use …
I have written a simple neural network (MLP Regressor), to fit simple data frame columns. To have an optimum architecture, I also defined it as a function to see whether it is converging to a pattern. But every time that I run the model, it gives me a different result than the last time that I tried, and I do not know why? Due to the fact that it is fairly difficult to make the question reproducible, I can not …
The data I have is a time series data (stock returns), and I am training a Random Forest Regressor on it. Total observations = 2499 To better evaluate the performance, I have implemented rolling windows testing with training window sizes = 500, 700, 900,..., 2100. Though instinctively it would seem obvious to choose a window size which produced lowest RMSE, how can I be sure that the comparison is fair? I mean with increasing window size, the test set size …
I have a dataframe containing the IDs of 2000 questions, a list of scores representing difficulty, and the following features: how often the question was answered, how often the answer has been changed because the students were undecided, a normalized "frequency of changing the answers" (so the last two feature divided) and the average time spent on a question. The most important seems to be this normalized frequency (50%), then the average time (22%), how often the question was answered …
I've been able to build a few linear regression models that can predict a material strength quite well: minimum RMSE of 17.95 using 11 attributes that I have selected from 159 original attributes. The data is distributed with mean=234.4 and stdev=19.9. I am working in Orange3. When using only the highest weighted attribute (weight 8.013) the model calculates RMSE of 18.767. If I use only the lowest weighted attribute (weight 0.051) the RMSE is 20.007. The difference is 1.24, or …
I have created a couple of models for my master project and I used several metrics for evaluation. I used MSE, MAE, MAPE, RMSE not because I really learned about them a lot, because I saw in many other projects these metrics being used. Now I have a problem, I need to interpret results. I search for some articles or some studies that classify metrics performance as good or bad or excellent. The only material I found now is this …
I added r2 value and the formula of the regression function but I also want RMSE value on my plot, maybe I need to add something but I could not see a proper answer to this question neither here nor google... ggplot(data = AGB.rf$pred) + geom_point(mapping = aes(x = pred, y = obs, color = pred, shape=1))+ geom_smooth(mapping = aes(x = pred, y = obs), method="lm", se = FALSE)+ stat_cor(aes(x = pred, y = obs, label = ..rr.label..),label.y = 3000)+ …
I'm working on a simple linear regression model to predict 'Label' based on 'feature'. The two variables seems to be highly correlate corr=0.99. After splitting the data sample for to training and testing sets. I make predictions and evaluate the model. metrics.mean_squared_error(Label_test,Label_Predicted) = 99.17777494521019 metrics.r2_score(Label_test,Label_Predicted) = 0.9909449021176512 Based on the r2_score my model is performing perfectly. 1 being the highest possible value. But when it comes to the mean squared error, I don't know if it shows that my model …