Gradient descent formula implementation in python

So I recently started with Andrew Ng's ML Course and this is the formula that Andrew lays out for calculating gradient descent on a linear model.

$$ \theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)}\right)x_j^{(i)} \qquad \text{simultaneously update } \theta_j \text{ for all } j$$

As we see, the formula asks us to the sum over all the rows in data.

However, the below code doesn't work if I apply np.sum()

def gradientDescent(X, y, theta, alpha, num_iters):

    # Initialize some useful values
    m = y.shape[0]  # number of training examples

    # make a copy of theta, to avoid changing the original array, since numpy arrays
    # are passed by reference to functions
    theta = theta.copy()

    J_history = [] # Use a python list to save cost in every iteration

    for i in range(num_iters):
        temp = np.dot(X, theta) - y
        temp = np.dot(X.T, temp)
        theta = theta - ((alpha / m) * np.sum(temp))
        # save the cost J in every iteration
        J_history.append(computeCost(X, y, theta))

    return theta, J_history

On the other hand, if I get rid of the np.sum(), the formula works perfectly.

def gradientDescent(X, y, theta, alpha, num_iters):

# Initialize some useful values
m = y.shape[0]  # number of training examples

# make a copy of theta, to avoid changing the original array, since numpy arrays
# are passed by reference to functions
theta = theta.copy()

J_history = [] # Use a python list to save cost in every iteration

for i in range(num_iters):
    temp = np.dot(X, theta) - y
    temp = np.dot(X.T, temp)
    theta = theta - ((alpha / m) * temp)
    # save the cost J in every iteration
    J_history.append(computeCost(X, y, theta))

return theta, J_history

Can someone please explain this?

Topic linear-algebra linear-regression

Category Data Science


Commenters are correct - you confuse vector and scalar operations.

The formula is scalar one, and here is how you can implement it:

for n in range(num_iters):
    
    for j in range(len(theta)):
    
        sum_j = 0
        for i in range(len(X)):
            temp = X[i, j]*theta[j] - y[i]
            temp = temp * X[i, j]
            sum_j += temp
        
        sum_j = (alpha / m)*sum_j
        
        theta[j] = theta[j] - sum_j

J_history.append(computeCost(X, y, theta))

But you're trying to plug vectors into the scalar formula and that's what causes confusion.


Your goal if to compute the gradients for the whole theta vector of size p (number of variables). Your temp is a vector also of size $p$, which contains the values of gradients of the cost function relative to each of your theta values.

Therefore, you want to substract point-wise the two vectors (with learning rate $\alpha$) to make an update, so no reason to sum the vector.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.