Optimizing an averaged perceptron algorithm using numpy and scipy instead of dictionaries

So I'm trying to write an averaged perceptron algorithm (page 48 here for the equation) in python. Instead of storing the historical weights, I simply accumulate the weights and then multiply consistency counter, $c$, that is the variable w_accum.

My implementation initially had the weight vectors and x represented as dictionaries where a feature is in the dictionary only if it's active, that was supposed to be the most efficient way I could think of. Here is that code:

def train():
    #Initialize w, bias
    self.w, self.w['bias'] = {feature:0.0 for feature in features}, 0.0
    self.w_accum, self.w_accum['bias'] = {feature:0.0 for feature in features}, 0.0
    c = 0
    # Iterate over the training data n times
    for i in range(iterations):
        # Check each training example
        for i in range(len(x_train)):
            xi, yi = x_train[i], y_train[i]
            y_hat = self.predict(xi)
            #Update weights if there is a misclassification
            if yi != y_hat:
                ## self.w_accum += self.w*c (basically that's what I'm doing)
                self.w_accum = {feature:self.w_accum[feature]+c*self.w[feature] for feature in self.w_accum}
                c = 1
                for feature, value in xi.items():
                    self.w[feature] = self.w[feature] + yi*eta*value
                self.w['bias'] = self.w['bias'] + yi*eta
            else:
                c += 1

with predict(self, x) being defined as follows:

def predict(self, x):
    if (x.shape[1] == (self.w.shape[0]-1)): # Bias
       x = hstack([x, np.ones(1)])
    s = np.dot(x.toarray(),self.w)
    return 1 if s  0 else -1

The problem with this implementation is that it got SIGKILL'ed on massive data sets, and I had to come up with a different solution, so I turned to scipy's sparse.DictVectorizer and ran it over the dictionaries to turn them into sparse dictionaries to use. Now my w and x are represented as csr_matrix objects. Here is what the new updated code looks like:

 def train(x_train, y_train):
        #Initialize w, bias
        len_features = x_train[0].shape[1]
        self.w = np.zeros((len_features+1,1))
        self.w_accum = np.zeros((len_features+1,1))
        c = 0
        # Iterate over the training data n times
        for i in range(epochs):
            # Check each training example
            for i in range(x_train.shape[0]):
                xi, yi = x_train[i], y_train[i]
                xi = hstack([xi, np.ones(1)])
                y_hat = self.predict(xi)
                #Update weights if there is a misclassification
                if yi != y_hat:
                    self.w_accum += self.w*c
                    c = 1
                    self.w += xi.transpose()*eta*yi
                elif yi == y_hat:
                    c += 1

and the predict function is as follows:

def predict(self, x):
    if (x.shape[1] == (self.w.shape[0]-1)): # X_{n+1} = 1
       x = hstack([x, np.ones(1)])
    s = np.dot(x.toarray(),self.w)
    return 1 if s  0 else -1

The problem is that this code is actually SLOWER than my previous implementation! I have no idea why. The code seems to take a long time around the x.transpose() command (third to last line in train) and in the np.dot call.

I thought using a sparse dictionary was supposed to be the solution? Is it because I'm using a dot product over all the features, even the zeros? Is there a way to fix this? I thought sparse vectors were optimal for these sorts of things?

Topic numpy perceptron implementation scipy optimization

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.