scikit-learn OMP mem error
I tried to use OMP algorithm available in scikit-learn. My net datasize which includes both target signal and dictionary ~ 1G. However when I ran the code, it exited with mem-error. The machine has 16G RAM, so I don't think this should have happened. I tried with some logging where the error came and found that the data got loaded completely into numpy arrays. And it was the algorithm itself that caused the error. Can someone help me with this or sugggest more memory efficient algorithm for feature selection, or is subsampling the data my only option. Are there some deterministic good subsampling techniques.
EDIT: Relevant code piece:
n=8;
y=mydata[:,0];
X=mydata[:,[1,2,3,4,5,6,7,8]];
#print y;
#print X;
print "here";
omp = OrthogonalMatchingPursuit(n_nonzero_coefs=5,copy_X = False, normalize=True);
omp.fit(X,y);
coef = omp.coef_;
print omp.coef_;
idx_r, = coef.nonzero();
for id in idx_r:
print coef[id], vars[id],"\n";
The error I get:
File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 324, in score
return r2_score(y, self.predict(X), sample_weight=sample_weight)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/metrics.py", line 2332, in r2_score
numerator = (weight * (y_true - y_pred) ** 2).sum(dtype=np.float64)
MemoryError
Topic scikit-learn feature-selection python scalability bigdata
Category Data Science