Need help to understand the formula of gradient descent with multiple features

I am trying to implement gradient descent with multiple features after listening to Andrew Ng's Coursera lecture. gradient descent for multiple features

So for example when calculating for theta 1, part of the formula requires you to subtract the real y value with the predicted y value (to calculate the error) and at the end of the formula you multiply it with the value of feature 1 in the ith training example, as denoted by the x superscript (i) subscript 1

so my question is, when calculating for each theta, when calculating the error, do you use the corresponding feature value? for example when calculating for theta 1, you calculate the error with the value of feature 1 and when for theta 2, you use the value of feature 2 when calculating your error, and so on.

But if that's the case, how would I calculate theta 0? Since there's no feature 0.

Let me try to explain what I meant here:

Topic linear-regression gradient-descent machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.