How do the authors get this updating formula for all $\beta$ in $\beta$-divergence
I'm reading the paper Algorithms for nonnegative matrix factorization with the β-divergence by Cédric Févotte and Jérôme Idier. Package scikit-learn uses their algorithm for module sklearn.decomposition.NMF. In section 4.1, they said
An MM algorithm can be derived by minimizing the auxiliary function $G(\mathbf{h} \mid \tilde{\mathbf{h}})$ w.r.t to $\mathbf{h}$. Given the convexity and the separability of the auxiliary function the optimum is obtained by canceling the gradient given by Eq. (36). This is trivially done and leads to the following update: $$ h_{k}^{\mathrm{MM}} = \tilde{h}_{k}\left(\frac{\sum_{f} w_{f k} v_{f} \tilde{v}_{f}^{\beta-2}}{\sum_{f} w_{f k} \tilde{v}_{f}^{\beta-1}}\right)^{\gamma(\beta)}. $$
The gradient in Eq. (36) is
This gradient depends on our choice of the decomposition of $\beta$-divergence. I don't get how the authors obtain such an explicit formula for $h_{k}^{\mathrm{MM}}$. Could you please elaborate on this issue?
Topic matrix-factorisation scikit-learn
Category Data Science