Constraining linear regressor parameters in scikit-learn?

I'm using sklearn.linear_model.Ridge to use ridge regression to extract the coefficients of a polynomial. However, some of the coefficients have physical constraints that require them to be negative. Is there a way to impose a constraint on those parameters? I haven't spotted one in the documentation...

As a workaround of sorts, I have tried making many fits using different complexity parameters (see toy code below) and selecting the one with coefficients that satisfy the physical constraint, but this is too unreliable to use in production.

# Preliminaries
from sklearn.linear_model import Ridge
n_alphas = 2000
alphas = np.logspace(-15,3,n_alphas)
# Perform fit
fits = {}
for alpha in alphas:
   temp_ridge = Ridge(alpha, fit_intercept=False)
   temp_ridge.fit(indep_training_data, dep_training_data)
   temp_ridge_R2 = temp_ridge.score(indep_test_data, dep_test_data)
   fits[alpha] = [temp_ridge, temp_ridge_R2]

Is there a way to impose a sign constraint on some of the parameters using ridge regression? Thanks!

Topic ridge-regression linear-regression regression scikit-learn

Category Data Science


It is possible to constrain to linear regression in scikit-learn to only positive coefficients. The sklearn.linear_model.LinearRegression has an option for positive=True which:

When set to True, forces the coefficients to be positive. This option is only supported for dense arrays.

The positive=True option is not available for ridge regression in scikit-learn.


I am assuming a linear regression of the form

$$y = w_0x_0 + w_1x_1+ \ldots w_px_p + \varepsilon.$$

If we combine all output observations into a single vector $\mathbf{y}$ and represent the data matrix with an 1-column from left as $\mathbf{X}$, then we can express the linear regression

$$\mathbf{y} = \mathbf{X}\mathbf{w} + \mathbf{\varepsilon},$$

in which $\mathbf{w}=[w_0, w_1,\ldots,w_p]^T$ and $\varepsilon$ is the vector of model errors. If you apply the ridge regression loss to this model and simplify the expressions you will obtain the following loss function.

$$E(\mathbf{w}) = \dfrac{1}{2} \mathbf{w}^T\left[\mathbf{X}^T\mathbf{X} + \lambda \mathbf{I} \right]\mathbf{w} + \left[-\mathbf{X}^T\mathbf{y} \right]^T\mathbf{w}$$

Our goal is to minimize this expression with the additional constraints on the coefficients. If we assume only negative coefficients we will obtain this inequality constraint

$$\mathbf{I}\mathbf{w} \preceq \mathbf{0}.$$

Hence, we have obtained a quaratic programming formulation of the problem.

$$\text{minimize: } E(\mathbf{w}) = \dfrac{1}{2} \mathbf{w}^T\left[\mathbf{X}^T\mathbf{X} + \lambda \mathbf{I} \right]\mathbf{w} + \left[-\mathbf{X}^T\mathbf{y} \right]^T\mathbf{w}$$ $$\text{subject to: } \mathbf{I}\mathbf{w} \preceq \mathbf{0}$$

You can solve this kind of problem quite straight forward with cvxopt for Python. You can also have more complicated linear constraints (equality and inequality constriants).

Note: CVXOPT uses $\mathbf{x}$ for the unknows, which are $\mathbf{w}$ in my formulation.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.