How do you find the eigenvalues of the matrix for the following momentum gradient descent?

The following question is based purely on the material available on MIT's open courseware youtube channel. (https://www.youtube.com/watch?v=wrEcHhoJxjM). In it, Professor Gilbert Strang explains the general formulation of the momentum gradient descent problem and ultimately arrives at optimum values (40:05 in the video) for the variables $s$ and $\beta$. \

$\textbf{Background}$



Lets begin with the standard gradient descent not covered in this video. The equation for this is:

$x_{k+1}=x{k}-s \nabla f(x_k) $

$s$ is the step size, $f(x_k)$ is the value of the function $f$ evaluated at the point $x_k$. The gradient then means that we take the derivative of the function with respect to each element of $x$, meaning that if $x$ is a vector, hence the function is a multidimensional function, then it looks something like this: $\nabla f(x_k)=\frac{\partial f}{\partial x_{k_{1}}}i+\frac{\partial f}{\partial x_{k_{2}}}$, where $x_{k_{2}}$ is the second element of the vector $x_k$. $k$ is just the iteration number.

The momentum gradient descent problem is then identified as:

$x_{k+1}=x_k-sz_{k+1}$

$z_k=\nabla f_k +\beta z_{k-1}$

Without side tracking too much I am assuming when Strang write $\nabla f_k$ he means $\nabla f(x_k)$.

He then sets up the following problem:

$ \begin{bmatrix} 1 0 \\ -s 1 \end{bmatrix} $ $ \begin{bmatrix} x_{k+1} \\ -z_{k+1} \end{bmatrix} $= $ \begin{bmatrix} 1 -s \\ 0 \beta \end{bmatrix} $ $ \begin{bmatrix} x_k \\ z_k \end{bmatrix} $

And jumping a little ahead:

$ \begin{bmatrix} c_{k+1} \\ d_{k+1} \end{bmatrix} $ =$ \begin{bmatrix} 1 -s \\ \lambda \beta-\lambda s \end{bmatrix} $$ \begin{bmatrix} c_k \\ d_k \end{bmatrix} $

This considers that $x_k$ is tracking the eigenvector, and he writes the following relationship:

$x_k=c_k q$

where $q$ is the eigenvector.

Moving ahead, he finally describes the matrix $R$ as being the matrix that multiplies the current iteration of values of $c_k$ and $d_k$ to obtain the next iteration:

R=$ \begin{bmatrix} 1 -s \\ \lambda \beta -\lambda s \end{bmatrix} $



$\textbf{Question}$



And here is where the root of my problem begins. He explains that the objective of this problem is to minimize $s$ and $\beta$ so that the maximum of the eigenvalues of $R$ are minimized. i.e.:



$Objective: min(max(|e_1|,|e_2|))$



Additionally, he describes that there are optimum values of $s$ and $\beta$:



$s_{optimal}=(\frac{2}{1+\sqrt(b)})^2$



$\beta_{optimal}=(\frac{1-\sqrt(b)}{1+\sqrt(b)})^2$



This is where my confusion begins. $\textbf{Does this mean that we want to find the eigenvectors of R}$ to then plug them back in to obtain the new $c$ and $d$'s like the following?:



$det(R-\lambda I)$?



I ran this through wolfram alpha with the following characteristic equation:



$\beta-\lambda(2s+1+\beta)+\lambda ^2 (s+1)=0$



Or is R the result of the operation of the previous matrices minus some $\lambda I$, i.e. we should do $det(R)=$ to find the eigenvalues?

Additionally, once we obtain $c_{k+1}$ and $d_{k+1}$ how do we move forward in the problem? In other words, how do we obtain $x_{k+1}$ and $z_{k+1}$? It seems as if we don't even have to find the eigenvalues of $R$ because if we obtain the optimum values of $s$ and $\beta$ we can just plug them back in the original set of equations:



$x_{k+1}=x_k-sz_{k+1}$

$z_k=\nabla f_k +\beta z_{k-1}$

Topic momentum gradient-descent

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.