lasso

How to compare between two methods of multivariate to filling NA

Husam Khiry

2022年5月20日 16:35

In the Titanic dataset, I performed two methods to fill Age NA. The first one is regression using Lasso: from sklearn.linear_model import Lasso AgefillnaModel=Lasso(copy_X=False) AgefillnaModel_X.dropna(inplace=True) y=DF.Age.dropna(inplace=False) AgefillnaModel.fit(AgefillnaModel_X,y) DF.loc[ageNaIn,'Age']=AgefillnaModel.predict(DF.loc[ageNaIn,AgefillnaModel_X.columns]) and the second method is using IterativeImputer() from scikit-learn.impute. from sklearn.experimental import enable_iterative_imputer from sklearn.impute import IterativeImputer # Setting the random_state argument for reproducibility imputer = IterativeImputer(random_state=42) imputed = imputer.fit_transform(DF) df_imputed = pd.DataFrame(imputed, columns=DF.columns) round(df_imputed, 2) Now, how can I decide which one is better? Here is the result of scattered Age …

Topic: lasso data-imputation missing-data scikit-learn

Category: Data Science

Regularizing the intercept - particular case

ChuckNoise

2022年5月20日 08:16

Yesterday I posted this thread Regularizing the intercept where I had a question about penalizing the intercept. In short, I asked wether there exist cases where penalizing the intercept leads to a lower expected prediction error and the answer was: Of course there exist scenarios where it makes sense to penalize the intercept, if that aligns with domain knowledge. However in real world, more often we do not just penalize the magnitude of intercept, but enforce it to be zero. …

Topic: lasso ridge-regression regularization

Category: Data Science

Generating artificial data to extend learning set

rik

2022年4月21日 18:43

I have dataset containing 42 instances(X) and one final Y on which i want to perform LASSO regression.All are continuous and numerical. As the sample size small, I wish to extend it. I am kind of aware of algorithms like SMOTE used for extending imbalanced dataset. Is there anything available for my case where there is no imbalance?

Topic: lasso data regression sampling dataset

Category: Data Science

How to extract MSEP or RMSEP from lassoCV?

Sally

2022年3月21日 11:13

I'm doing lasso and ridge regression in R with the package chemometrics. With ridgeCV it is easy to extract the SEP and MSEP values by modell.ridge$RMSEP and model.ridge$SEP. But how can I do this with lassoCV? model.lasso$SEP works, but there is no RMSE or MSE entry in the list. However the function provides a plot with MSEP and SEP in the legend. Therefore it must be possible to extract both values! But how? SEP = standard error of the predictions; …

Topic: mse lasso ridge-regression evaluation r

Category: Data Science

Why is the L2 penalty squared but the L1 penalty isn't in elastic-net regression?

Tomer Wolberg

2022年3月19日 15:50

There was some data set I worked with which I wanted to solve non negative least squares (NNLS) on and I wanted a sparse model. After a bit of experiementing I found that what worked the best for me was using the following loss function: $$\min_{x \geq 0} ||Ax-b|| + \lambda_1||x||_2^2+\lambda_2||x||_1^2$$ Where the L2 squared penalty was implemented by adding white noise with a standard deveation of $\sqrt{\lambda_1}$ to $A$ (which can be showed to be equivelent to ridge regression …

Topic: elastic-net sparsity lasso regression

Category: Data Science

Why is gridsearchCV.best_estimator_.score giving me r2_score even if I mentioned MAE as my main scoring metric?

Echo

2022年2月16日 23:47

I have a lasso regression model with the following definition : import sklearn from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import scale from sklearn.feature_selection import RFE from sklearn.linear_model import LinearRegression, Lasso from sklearn.svm import SVR from sklearn.model_selection import cross_val_score from sklearn.model_selection import KFold from sklearn.model_selection import GridSearchCV from sklearn.pipeline import make_pipeline from sklearn.metrics import r2_score folds = KFold(n_splits = 5, shuffle = True, random_state = 100) # specify range of hyperparameters hyper_params = …

Topic: lasso gridsearchcv score regression scikit-learn

Category: Data Science

Is it possible to explain why Lasso models eliminated certain coefficient?

NAS

2022年2月16日 08:30

Is it possible to understand why Lasso models eliminated specific coefficients?. During the modelling, many of the highly correlated features in data is being eliminated by Lasso regression. Is it possible why precisely these features are being eliminated from the model? (Is it the presence of any other features/multicollinearity etc.? I want to explain the lasso model behaviour. Your help is highly appreciated.

Topic: linear-models lasso regularization linear-regression correlation

Category: Data Science

Lasso (or Ridge) vs Bayesian MAP

dzheng1887

2022年1月18日 08:03

This is the first time I have posted here. I am looking for some feedback or perspective on this question. To make it simple, let's just talk about linear models. We know the MLE solution for the $l_1$ loss objective is the same as the Bayesian MAP estimate with a Laplace prior for each parameter. I'll show it here for convenience. For vector $Y$ with $n$ observations, matrix $X$, parameters $\beta$, and noise $\epsilon$ $$Y = X\beta + \epsilon,$$ the …

Topic: linear-models lasso theory bayesian

Category: Data Science

Accessing regression coefficients when using MultiOutputRegressor

lazarea

2022年1月14日 10:27

I am working on a multioutput (nr. targets: 2) regression task. The original data has a huge dimensionality (p>>n, i.e. there are far more predictors than observations), hence for the baseline models I chose to experiment with Lasso regression, wrapped in sklearn's MultiOutputRegressor. After optimizing the hyperparameters of the Lasso baseline, I wanted to look into model explainability by retrieving the coef_ of the wrapped Lasso regression model(s), but this doesn't seem to be possible. I'm now wondering how I …

Topic: lasso linear-regression regression machine-learning

Category: Data Science

Why are we not checking the significance of the coefficients in Lasso and elastic net models

NAS

2021年12月28日 13:17

As far as I know, we don't check the coefficient significance in Lasso and elasticnet models. Is it because insignificant feature coefficients will be driven to zero in these models?. Does that mean that all the features in these models are significant? Why are we not checking the significance of the coefficients in Lasso and net elastic models?.

Topic: elastic-net lasso regularization linear-regression

Category: Data Science

Difference between PCA and regularisation

Hang

2021年12月13日 23:04

Currently, I am confusing about PCA and regularisation. I wonder what is the difference between PCA and regularisation: particularly lasso (L1) regression? Seems both of them can do the feature selection. I have to admit, I am not quiet familiar with the difference between dimensional reduction and feature selection.

Topic: lasso pca regularization

Category: Data Science

Predicting single floats based on set of 2 feature arrays each of 100 values

fdrobiazg

2021年12月9日 07:09

I am trying to predict audio to video desynchronization based on set of two arrays of lenght 100 which consist of coresponding audio and video samples. The problem is that my labels are single floats (values of shift), while both audio and video data are arrays of lenght 100. So far I tried Lasso for that problem but I couldn't get rid of errors while fitting model. This is how my data looks like: >> print(audio) [[0.675324 ... 0.59183673, ] …

Topic: lasso scikit-learn dataset python machine-learning

Category: Data Science

Do I have to remove features with pairwise correlation even if I am doing a regularized logistic regression?

lostwanderer

2021年10月23日 07:31

Normally we would remove features that have high pairwise correlation with another feature before performing regression. But is this step necessary if I am applying L2 regularized logistic regression (since the regularization algorithm would shrink the "irrelevant" feature coefficients to zero anyway)?

Topic: lasso regularization regression logistic-regression

Category: Data Science

Elegant way to plot the L2 regularization path of logistic regression in python?

lostwanderer

2021年10月23日 03:59

Trying to plot the L2 regularization path of logistic regression with the following code (an example of regularization path can be found in page 65 of the ML textbook Elements of Statistical Learning https://web.stanford.edu/~hastie/Papers/ESLII.pdf). Have a feeling that I am doing it the dumb way - think there is a simpler and more elegant way to code it - suggestions much appreciated thanks. counter = 0 for c in np.arange(-10, 2, dtype=np.float): lr = LogisticRegression(C = 10**c, fit_intercept=True, solver = …

Topic: lasso matplotlib regularization python

Category: Data Science

Can I rescale TF matrix or TF-IDF matrix using StandardScaler prior to Logisitc Lasso regression?

Patrick Steele

2021年10月20日 15:13

I am trying to use Logistic Lasso to classify documents as 1 or 0. I've tried using both the TF matrix and TF-IDF matrix representations of the documents as my predictors. I've found that if I use the StandardScaler function in python (standardizing features by removing the mean and scaling to unit variance) on the matrices prior to Lasso, the model performance improves in both cases. Is it acceptable to rescale the TF or TF-IDF matrix using StandardScaler prior to …

Topic: lasso tfidf feature-scaling python

Category: Data Science

What's the correct cost function for Linear Regression

Chris

2021年10月15日 07:15

As we all know the cost function for linear regression is: Where as when we use Ridge Regression we simply add lambda*slope**2 but there I always seee the below as cost function of linear Regression where it's not divided by the number of records.: So I just want to knows what's the correct cost function, Ik both are correct but while ding Ridge or Lasso why we ignore the division part?

Topic: lasso ridge-regression cost-function linear-regression machine-learning

Category: Data Science

Can Adagrad or Adam be used in loss function with l1-norm regularization?

Keivan

2021年9月22日 15:14

there is one question for me. I want to know that how Adam or Adagrad treat l1-norm regularization in loss-function? (e.g. Lasso) I know that l1-norm is not differentiable function at zero but we can define subgradient for this function. I am eager to know that whether Adam optimizer utilize subgradient in this condition or not. As far as I know, Adam Optimizer utilize Adagrad benefits and Adagrad is stochastic subgradient method. So, can we conclude that Adam can work …

Topic: lasso pytorch deep-learning optimization machine-learning

Category: Data Science

What is the meaning of the sparsity parameter

Sm1

2021年9月20日 23:11

Sparse methods such as LASSO contain a parameter $\lambda$ which is associated with the minimization of the $l_1$ norm. Higher the value of $\lambda$ ($>0$) means that more coefficients will be shrunk to zero. What is unclear to me is that how does this method decides which coefficients to shrink to zero? If $\lambda = 0.5$ then does it mean that those coefficients whose values are less than or equal to 0.5 will become zero? So in other words, whatever …

Topic: elastic-net sparsity lasso ridge-regression regularization

Category: Data Science

Lasso regression not getting better without random features

Onur Ece

2021年9月20日 23:11

First of all, I'm new to lasso regression, so sorry if this feels stupid. I'm trying to build a regression model and wanted to use lasso regression for feature selection as I have quite a few features to start with. I started by standardizing all features and plotting the weights of each feature as I changed my regularisation strength to see which ones are most important. I also plotted the RMSE on the holdout set to find a U-shaped plot, …

Topic: lasso regression machine-learning

Category: Data Science

How to handle both the categorical and ordinal features in a single data sets?

Manas Satti

2021年9月11日 19:02

I was practicing Lasso regression with the SPARCS hospital dataset. There are two kinds of features in the dataset: Categorical features like location of the hospital, demographics of patients, etc. Ordinal features like the length of stay, the severity of disease, rate of mortality, etc. When processing the dataset I created new features by one-hot encoding the categorical features in, let us say, X_cardi DataFrame and by generating polynomial features for the ordinal features in X_ordi DataFrame. X_combined = pd.concat([X_ordi, …

Topic: lasso scikit-learn feature-selection categorical-data machine-learning

Category: Data Science

About