scikit-learn OMP mem error

Question

scikit-learn OMP mem error

sshanks

2022年5月16日 13:02

I tried to use OMP algorithm available in scikit-learn. My net datasize which includes both target signal and dictionary ~ 1G. However when I ran the code, it exited with mem-error. The machine has 16G RAM, so I don't think this should have happened. I tried with some logging where the error came and found that the data got loaded completely into numpy arrays. And it was the algorithm itself that caused the error. Can someone help me with this or sugggest more memory efficient algorithm for feature selection, or is subsampling the data my only option. Are there some deterministic good subsampling techniques.

EDIT: Relevant code piece:

n=8;
y=mydata[:,0];
X=mydata[:,[1,2,3,4,5,6,7,8]];
#print y;
#print X;
print "here";
omp = OrthogonalMatchingPursuit(n_nonzero_coefs=5,copy_X = False, normalize=True);
omp.fit(X,y);
coef = omp.coef_;
print omp.coef_;
idx_r, = coef.nonzero();
for id in idx_r:
        print coef[id], vars[id],"\n";

The error I get:

File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 324, in score
return r2_score(y, self.predict(X), sample_weight=sample_weight)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/metrics.py", line 2332, in r2_score
numerator = (weight * (y_true - y_pred) ** 2).sum(dtype=np.float64)
MemoryError

Topic scikit-learn feature-selection python scalability bigdata

Category Data Science

Brian Spiering · Accepted Answer · 2022年4月9日 15:56

One option is to set precompute to True, which will precompute the Gram and Xy matrix. It would be something like:

from sklearn.linear_model import OrthogonalMatchingPursuit

omp = OrthogonalMatchingPursuit(precompute=True, n_nonzero_coefs=5,copy_X = False, normalize=True)

Also upgrading to Python 3 might help with memory issues.

scikit-learn OMP mem error

About