Matrix multiplication
I have downstream gradient for every $sample$ (each row for every $x_i$)
$$ \begin{bmatrix} 0.0062123 -0.00360166 -0.00479891 \\ -0.01928449 0.01240768 0.01493274 \\ -0.01751177 0.01140975 0.01469825 \\ 0.0074906 -0.00531709 -0.00637952 \end{bmatrix} $$
And I have my inputs (my local gradient)
$$ \begin{bmatrix} 0 0 \\ 0 1 \\ 1 0 \\ 1 1 \\ \end{bmatrix} $$
I want to calculate downgrade gradient and for this what I do is I transpose downstream gradient matrix and then do matrix multiplaction
downstream_gradient.T @ local_gradient
First question: I understand that outputs is SUM of gradients for each $x_i$. Am I right or am I wrong?
Second question: And do I need to divide matrix by len($X$) to get mean gradient?
Topic matrix gradient-descent machine-learning
Category Data Science