regularization

Confusion with L2 Regularization in Back-propagation

10xAI

2022年6月3日 17:22

In a very simple language, this is L2 regularization $\hspace{3cm}$$Loss_R$ = $Loss_N + \sum w_i^2$ $Loss_N$ - Loss without regularization $Loss_R$ - Loss with regularization When implementing [Ref], we simply add the derivative of the new penaty to the current delta weight, $\hspace{3cm}$$dw = dw_N + constant*w$ $dw_N$ - Weight delta without regularization What I think - L2 regularization is achieved with the last step only i.e. the weight is penalized. My question is - Why do we then add …

Topic: mathematics regularization backpropagation

Category: Data Science

Importing Excel format data into R/R Studio and using glmnet package?

Sympa

2022年5月31日 03:03

I have no problem importing Excel formatted data into R/R Studio and use all other R packages that I use. But, when I want to use the glmnet package to develop a regularization model, I invariably run into the following error (after specifying my regularization model and attempting to run it): Error in storage.mode(y) <- "double": (list) object cannot be coerced to type 'double' Here is what I have already tried to resolve this: De-format the numbers in Excel (no …

Topic: regularization data excel error-handling r

Category: Data Science

Regularizing the intercept - particular case

ChuckNoise

2022年5月20日 08:16

Yesterday I posted this thread Regularizing the intercept where I had a question about penalizing the intercept. In short, I asked wether there exist cases where penalizing the intercept leads to a lower expected prediction error and the answer was: Of course there exist scenarios where it makes sense to penalize the intercept, if that aligns with domain knowledge. However in real world, more often we do not just penalize the magnitude of intercept, but enforce it to be zero. …

Topic: lasso ridge-regression regularization

Category: Data Science

Correct theoretical regularized objective function for XGB/LGBM (regression task)

Manu675

2022年5月20日 04:03

I am writing an academic paper on the application of Machine Learning methods to Time Series Forecasting and I am unsure about how to write down the theoretical part about the regularized objective function for XGBoosting. Below you can find the equation given by the developers of the XGB algorithm for the regularized objective function (equation 2). The paper is called "XGBoost: A Scalable Tree Boosting System" by Chen & Guestrin (2016). In the Python API from the xgb library …

Topic: lightgbm regularization xgboost

Category: Data Science

Regularizing the intercept

ChuckNoise

2022年5月18日 19:30

I am reading The Elements of Statistical Learning and regarding regularized logistic regression it says: "As with the lasso, we typically do not penalize the intercept term" and I am wondering in which situations you would penalize the intercept? Looking at regularization in general, couldn't one think of scenarios where penalizing the intercept would lead to a better EPE (expected prediction error)? Although we increase the bias wouldn't we in some scenarios still reduce the EPE? EDIT It might be …

Topic: regularization

Category: Data Science

Custom regularisation for logistics regression

claudius

2022年5月14日 08:04

My understanding of l2 regularisation: Weights of the model are assumed to have a prior guassian distribution centered around 0. Then MAP estimate over data adds an extra penalty in cost function. My problem statement: I am making a reasonable assumption(based on domain knowledge) that my features are independent which means I can use the weights of the features to infer the importance of features in influencing Y. From domain knowledge, I want to assume priors about the ratio of …

Topic: bayesian regularization logistic-regression

Category: Data Science

Why non-differentiable regularization lead to setting coefficients to 0?

Victor

2022年4月28日 22:01

The L2 regularization lead to minimize the values in the vector parameter. The L1 regularization lead to setting some coefficients to 0 in the vector parameter. More generally, I've seen that non-differentiable regularization function lead to setting coefficients to 0 in the parameter vector. Why is that the case?

Topic: regularization

Category: Data Science

Regularization and loss function

Piskator

2022年4月25日 09:31

I am currently trying to get a better understanding of regularization as a concept. This leads me to the following question: Will regularization change when we change the loss function? Is it correct that this is the sole way that these concepts are related?

Topic: regularization loss-function optimization

Category: Data Science

Why is my loss blowing up after adding regularization

BOSSrobot

2022年4月14日 20:01

I tried to add L2 regularization to a network class I wrote however when I train it the loss blows up even though accuracy also increases. Can someone explain where I am going wrong? (I am using the formulas from here) The update to minibatch (The (1-eta*(lmbda/n)) coefficient to w is what I added) def update_mini_batch(self, mini_batch, eta, lmbda, n): # n is the number of training samples being trained from # Turn the mini_batch with one dimensional samples into …

Topic: regularization loss-function neural-network

Category: Data Science

Version of Perceptron

Ben

2022年3月20日 23:01

If we change the $ywx<0$ condition (for performing update) to $ywx<1$ like in SVM (but without adding regularization to maximize the margin), is there any difference from the basic perceptron (the one with the aforementioned $ywx<0$ condition)?

Topic: perceptron regularization svm machine-learning

Category: Data Science

Why is l1 regularization rarely used comparing to l2 regularization in Deep Learning?

seermer

2022年3月17日 17:03

l1 regularization increases sparsity, so unimportant weights are decreased closer to 0. In Deep Learning models, the input usually consists of thousands or millions of features/pixels, and the network usually contains millions to even billions of weights. Intuitively and theoretically, such feature selection should be very helpful in Deep Learning models to reduce overfitting problems since not all features/weights are important, selecting important ones from millions of weights reduces the function complexity, which therefore reduces the possibility of "memorizing" the …

Topic: regularization deep-learning neural-network machine-learning

Category: Data Science

Regularization for intercept parameter

N.M

2022年3月12日 17:02

Why is the regularization parameter not applied to the intercept parameter? From what I have read about the cost functions for Linear and Logistic regression, the regularization parameter (λ) is applied to all terms except the intercept. For example, here are the cost functions for linear and logistic regression respectively (Notice that j starts from 1):

Topic: cost-function regularization linear-regression regression logistic-regression

Category: Data Science

Regularization hyperparam tuning during training

Oren Matar

2022年2月18日 21:08

I have an idea for a regularization-hyperparam selection method, which I haven't encountered before and can't find on Google, but I'm sure someone has already tried it and I'm wondering what are the best practices. The most common method for hyperparam selection is to select different hyperparams (e.g some value for L2 regularization), train NNs with them, and test the NNs on some validation set - and select the best one. My idea is to train a single NN and …

Topic: hyperparameter-tuning overfitting regularization neural-network

Category: Data Science

Is it possible to explain why Lasso models eliminated certain coefficient?

NAS

2022年2月16日 08:30

Is it possible to understand why Lasso models eliminated specific coefficients?. During the modelling, many of the highly correlated features in data is being eliminated by Lasso regression. Is it possible why precisely these features are being eliminated from the model? (Is it the presence of any other features/multicollinearity etc.? I want to explain the lasso model behaviour. Your help is highly appreciated.

Topic: linear-models lasso regularization linear-regression correlation

Category: Data Science

Request: Confirmation on my understanding of overfitting and regularization concepts

Enthusiast

2022年2月8日 06:01

Overfitted models tend to have largely different (some very high, some comparatively low) coefficients/weights for different feature values. So, this means the model (when drawn as graph) will have high variation in slopes and even a small change in training data value (feature value) can lead to large change in output. To smoothen the overfitted model/curve that has high slope variation, we use regularization (example: L1/L2). L1 regularization removes unnecessary/less influential features from the model making the model less complex. …

Topic: overfitting regularization machine-learning

Category: Data Science

L1 regularization to first layer or all the layers

Wenuka

2022年2月7日 09:56

I have lots of features in the input to a Fully Connected Neural Network(FCNN) and was thinking to add L1 regularization to only select the most relevant features. I found how to add it following this link, and added it to the weights of the first layer (my FCNN is 4 layers deep). However, when I manually check the weights all of them are now super small (<1e-4) and none of them are zero as I expected (that's why I …

Topic: regularization neural-network machine-learning

Category: Data Science

What exactly is activity sparsity and why is it beneficial?

Luuk

2022年2月3日 14:58

I have been reading about weight sparsity and activity sparsity with regard to convolutional neural networks. Weight sparsity I understood as having more trainable weights being exactly zero, which would essentially mean having less connections, allowing for a smaller memory footprint and quicker inference on test data. Additionally, it would help against overfitting (which I understand in terms of smaller weights leading to simpler models/Ockham's razor). From what I understand now, activity sparsity is analogous in that it would lead …

Topic: sparse sparsity cnn regularization

Category: Data Science

What is the intuition behind decreasing the slope when using regularization?

satinder singh

2022年1月31日 02:00

While training a logistic regression model, using regularization can help distribute weights and avoid reliance on some particular weight, making the model more robust. Eg: suppose my input vector is 4 dimensional. The input values are [1,1,1,1]. The output can be 1 if my weight matrix has values [1,0,0,0] or [0.25,0.25,0.25,0.25]. L2 norm would give the later weight matrix (because pow(1, 2) > 4*pow(0.25,2) ). I understand intuitively why l2 regularization can be beneficial here. But in case of linear …

Topic: regularization

Category: Data Science

Convolutional Neural Network overfitting

Simon

2022年1月24日 14:58

I built a CNN to learn to classify EEG data (only about 4000 training examples, 2 classes, 50-50 class balance). Each training example is 64x512, with 5 channels each Ive tried to keep the network as simple/small as possible for testing: ConvLayer (4 filters) MaxPool Dropout 50% Fully connected (50 neurons) Dropout 50% Softmax Im also using weight decay (L2 reg, lambda = 0.001) The problem is no matter how I play with the filter parameters (size, stride, number) my …

Topic: convolutional-neural-network regularization neural-network classification machine-learning

Category: Data Science

Problems with Graphical Lasso

F Lourenço

2022年1月14日 17:04

I'm trying to use the Graphical Lasso algorithm (more specifically the R package glasso) to find an estimated graph representing the connections between a set of nodes by estimating a precision matrix. I have a feature matrix containing the values of multiple features for each of the nodes, and the sample covariance matrix obtained from the product between this matrix and its tranpose is used as the input for the glasso function, along with the l1 regularization coefficient $\lambda$. However, …

Topic: graphical-model regularization correlation graphs r

Category: Data Science

About