Optimizing an averaged perceptron algorithm using numpy and scipy instead of dictionaries
So I'm trying to write an averaged perceptron algorithm (page 48 here for the equation) in python. Instead of storing the historical weights, I simply accumulate the weights and then multiply consistency counter, $c$, that is the variable w_accum
.
My implementation initially had the weight vectors and x represented as dictionaries where a feature is in the dictionary only if it's active, that was supposed to be the most efficient way I could think of. Here is that code:
def train():
#Initialize w, bias
self.w, self.w['bias'] = {feature:0.0 for feature in features}, 0.0
self.w_accum, self.w_accum['bias'] = {feature:0.0 for feature in features}, 0.0
c = 0
# Iterate over the training data n times
for i in range(iterations):
# Check each training example
for i in range(len(x_train)):
xi, yi = x_train[i], y_train[i]
y_hat = self.predict(xi)
#Update weights if there is a misclassification
if yi != y_hat:
## self.w_accum += self.w*c (basically that's what I'm doing)
self.w_accum = {feature:self.w_accum[feature]+c*self.w[feature] for feature in self.w_accum}
c = 1
for feature, value in xi.items():
self.w[feature] = self.w[feature] + yi*eta*value
self.w['bias'] = self.w['bias'] + yi*eta
else:
c += 1
with predict(self, x)
being defined as follows:
def predict(self, x):
if (x.shape[1] == (self.w.shape[0]-1)): # Bias
x = hstack([x, np.ones(1)])
s = np.dot(x.toarray(),self.w)
return 1 if s 0 else -1
The problem with this implementation is that it got SIGKILL
'ed on massive data sets, and I had to come up with a different solution, so I turned to scipy's sparse.DictVectorizer and ran it over the dictionaries to turn them into sparse dictionaries to use. Now my w and x are represented as csr_matrix
objects. Here is what the new updated code looks like:
def train(x_train, y_train):
#Initialize w, bias
len_features = x_train[0].shape[1]
self.w = np.zeros((len_features+1,1))
self.w_accum = np.zeros((len_features+1,1))
c = 0
# Iterate over the training data n times
for i in range(epochs):
# Check each training example
for i in range(x_train.shape[0]):
xi, yi = x_train[i], y_train[i]
xi = hstack([xi, np.ones(1)])
y_hat = self.predict(xi)
#Update weights if there is a misclassification
if yi != y_hat:
self.w_accum += self.w*c
c = 1
self.w += xi.transpose()*eta*yi
elif yi == y_hat:
c += 1
and the predict
function is as follows:
def predict(self, x):
if (x.shape[1] == (self.w.shape[0]-1)): # X_{n+1} = 1
x = hstack([x, np.ones(1)])
s = np.dot(x.toarray(),self.w)
return 1 if s 0 else -1
The problem is that this code is actually SLOWER than my previous implementation! I have no idea why. The code seems to take a long time around the x.transpose()
command (third to last line in train
) and in the np.dot
call.
I thought using a sparse dictionary was supposed to be the solution? Is it because I'm using a dot product over all the features, even the zeros? Is there a way to fix this? I thought sparse vectors were optimal for these sorts of things?
Topic numpy perceptron implementation scipy optimization
Category Data Science