How to compare between two methods of multivariate to filling NA

In the Titanic dataset, I performed two methods to fill Age NA. The first one is regression using Lasso: from sklearn.linear_model import Lasso AgefillnaModel=Lasso(copy_X=False) AgefillnaModel_X.dropna(inplace=True) y=DF.Age.dropna(inplace=False) AgefillnaModel.fit(AgefillnaModel_X,y) DF.loc[ageNaIn,'Age']=AgefillnaModel.predict(DF.loc[ageNaIn,AgefillnaModel_X.columns]) and the second method is using IterativeImputer() from scikit-learn.impute. from sklearn.experimental import enable_iterative_imputer from sklearn.impute import IterativeImputer # Setting the random_state argument for reproducibility imputer = IterativeImputer(random_state=42) imputed = imputer.fit_transform(DF) df_imputed = pd.DataFrame(imputed, columns=DF.columns) round(df_imputed, 2) Now, how can I decide which one is better? Here is the result of scattered Age …
Category: Data Science

Regularizing the intercept - particular case

Yesterday I posted this thread Regularizing the intercept where I had a question about penalizing the intercept. In short, I asked wether there exist cases where penalizing the intercept leads to a lower expected prediction error and the answer was: Of course there exist scenarios where it makes sense to penalize the intercept, if that aligns with domain knowledge. However in real world, more often we do not just penalize the magnitude of intercept, but enforce it to be zero. …
Category: Data Science

Generating artificial data to extend learning set

I have dataset containing 42 instances(X) and one final Y on which i want to perform LASSO regression.All are continuous and numerical. As the sample size small, I wish to extend it. I am kind of aware of algorithms like SMOTE used for extending imbalanced dataset. Is there anything available for my case where there is no imbalance?
Category: Data Science

How to extract MSEP or RMSEP from lassoCV?

I'm doing lasso and ridge regression in R with the package chemometrics. With ridgeCV it is easy to extract the SEP and MSEP values by modell.ridge$RMSEP and model.ridge$SEP. But how can I do this with lassoCV? model.lasso$SEP works, but there is no RMSE or MSE entry in the list. However the function provides a plot with MSEP and SEP in the legend. Therefore it must be possible to extract both values! But how? SEP = standard error of the predictions; …
Category: Data Science

Why is the L2 penalty squared but the L1 penalty isn't in elastic-net regression?

There was some data set I worked with which I wanted to solve non negative least squares (NNLS) on and I wanted a sparse model. After a bit of experiementing I found that what worked the best for me was using the following loss function: $$\min_{x \geq 0} ||Ax-b|| + \lambda_1||x||_2^2+\lambda_2||x||_1^2$$ Where the L2 squared penalty was implemented by adding white noise with a standard deveation of $\sqrt{\lambda_1}$ to $A$ (which can be showed to be equivelent to ridge regression …
Category: Data Science

Why is gridsearchCV.best_estimator_.score giving me r2_score even if I mentioned MAE as my main scoring metric?

I have a lasso regression model with the following definition : import sklearn from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import scale from sklearn.feature_selection import RFE from sklearn.linear_model import LinearRegression, Lasso from sklearn.svm import SVR from sklearn.model_selection import cross_val_score from sklearn.model_selection import KFold from sklearn.model_selection import GridSearchCV from sklearn.pipeline import make_pipeline from sklearn.metrics import r2_score folds = KFold(n_splits = 5, shuffle = True, random_state = 100) # specify range of hyperparameters hyper_params = …
Category: Data Science

Is it possible to explain why Lasso models eliminated certain coefficient?

Is it possible to understand why Lasso models eliminated specific coefficients?. During the modelling, many of the highly correlated features in data is being eliminated by Lasso regression. Is it possible why precisely these features are being eliminated from the model? (Is it the presence of any other features/multicollinearity etc.? I want to explain the lasso model behaviour. Your help is highly appreciated.
Category: Data Science

Lasso (or Ridge) vs Bayesian MAP

This is the first time I have posted here. I am looking for some feedback or perspective on this question. To make it simple, let's just talk about linear models. We know the MLE solution for the $l_1$ loss objective is the same as the Bayesian MAP estimate with a Laplace prior for each parameter. I'll show it here for convenience. For vector $Y$ with $n$ observations, matrix $X$, parameters $\beta$, and noise $\epsilon$ $$Y = X\beta + \epsilon,$$ the …
Category: Data Science

Accessing regression coefficients when using MultiOutputRegressor

I am working on a multioutput (nr. targets: 2) regression task. The original data has a huge dimensionality (p>>n, i.e. there are far more predictors than observations), hence for the baseline models I chose to experiment with Lasso regression, wrapped in sklearn's MultiOutputRegressor. After optimizing the hyperparameters of the Lasso baseline, I wanted to look into model explainability by retrieving the coef_ of the wrapped Lasso regression model(s), but this doesn't seem to be possible. I'm now wondering how I …
Category: Data Science

Why are we not checking the significance of the coefficients in Lasso and elastic net models

As far as I know, we don't check the coefficient significance in Lasso and elasticnet models. Is it because insignificant feature coefficients will be driven to zero in these models?. Does that mean that all the features in these models are significant? Why are we not checking the significance of the coefficients in Lasso and net elastic models?.
Category: Data Science

Difference between PCA and regularisation

Currently, I am confusing about PCA and regularisation. I wonder what is the difference between PCA and regularisation: particularly lasso (L1) regression? Seems both of them can do the feature selection. I have to admit, I am not quiet familiar with the difference between dimensional reduction and feature selection.
Category: Data Science

Predicting single floats based on set of 2 feature arrays each of 100 values

I am trying to predict audio to video desynchronization based on set of two arrays of lenght 100 which consist of coresponding audio and video samples. The problem is that my labels are single floats (values of shift), while both audio and video data are arrays of lenght 100. So far I tried Lasso for that problem but I couldn't get rid of errors while fitting model. This is how my data looks like: >> print(audio) [[0.675324 ... 0.59183673, ] …
Category: Data Science

Do I have to remove features with pairwise correlation even if I am doing a regularized logistic regression?

Normally we would remove features that have high pairwise correlation with another feature before performing regression. But is this step necessary if I am applying L2 regularized logistic regression (since the regularization algorithm would shrink the "irrelevant" feature coefficients to zero anyway)?
Category: Data Science

Elegant way to plot the L2 regularization path of logistic regression in python?

Trying to plot the L2 regularization path of logistic regression with the following code (an example of regularization path can be found in page 65 of the ML textbook Elements of Statistical Learning https://web.stanford.edu/~hastie/Papers/ESLII.pdf). Have a feeling that I am doing it the dumb way - think there is a simpler and more elegant way to code it - suggestions much appreciated thanks. counter = 0 for c in np.arange(-10, 2, dtype=np.float): lr = LogisticRegression(C = 10**c, fit_intercept=True, solver = …
Category: Data Science

Can I rescale TF matrix or TF-IDF matrix using StandardScaler prior to Logisitc Lasso regression?

I am trying to use Logistic Lasso to classify documents as 1 or 0. I've tried using both the TF matrix and TF-IDF matrix representations of the documents as my predictors. I've found that if I use the StandardScaler function in python (standardizing features by removing the mean and scaling to unit variance) on the matrices prior to Lasso, the model performance improves in both cases. Is it acceptable to rescale the TF or TF-IDF matrix using StandardScaler prior to …
Category: Data Science

What's the correct cost function for Linear Regression

As we all know the cost function for linear regression is: Where as when we use Ridge Regression we simply add lambda*slope**2 but there I always seee the below as cost function of linear Regression where it's not divided by the number of records.: So I just want to knows what's the correct cost function, Ik both are correct but while ding Ridge or Lasso why we ignore the division part?
Category: Data Science

Can Adagrad or Adam be used in loss function with l1-norm regularization?

there is one question for me. I want to know that how Adam or Adagrad treat l1-norm regularization in loss-function? (e.g. Lasso) I know that l1-norm is not differentiable function at zero but we can define subgradient for this function. I am eager to know that whether Adam optimizer utilize subgradient in this condition or not. As far as I know, Adam Optimizer utilize Adagrad benefits and Adagrad is stochastic subgradient method. So, can we conclude that Adam can work …
Category: Data Science

What is the meaning of the sparsity parameter

Sparse methods such as LASSO contain a parameter $\lambda$ which is associated with the minimization of the $l_1$ norm. Higher the value of $\lambda$ ($>0$) means that more coefficients will be shrunk to zero. What is unclear to me is that how does this method decides which coefficients to shrink to zero? If $\lambda = 0.5$ then does it mean that those coefficients whose values are less than or equal to 0.5 will become zero? So in other words, whatever …
Category: Data Science

Lasso regression not getting better without random features

First of all, I'm new to lasso regression, so sorry if this feels stupid. I'm trying to build a regression model and wanted to use lasso regression for feature selection as I have quite a few features to start with. I started by standardizing all features and plotting the weights of each feature as I changed my regularisation strength to see which ones are most important. I also plotted the RMSE on the holdout set to find a U-shaped plot, …
Category: Data Science

How to handle both the categorical and ordinal features in a single data sets?

I was practicing Lasso regression with the SPARCS hospital dataset. There are two kinds of features in the dataset: Categorical features like location of the hospital, demographics of patients, etc. Ordinal features like the length of stay, the severity of disease, rate of mortality, etc. When processing the dataset I created new features by one-hot encoding the categorical features in, let us say, X_cardi DataFrame and by generating polynomial features for the ordinal features in X_ordi DataFrame. X_combined = pd.concat([X_ordi, …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.