How do the authors get this updating formula for all $\beta$ in $\beta$-divergence

Question

How do the authors get this updating formula for all $\beta$ in $\beta$-divergence

Akira

2021年7月19日 07:59

I'm reading the paper Algorithms for nonnegative matrix factorization with the β-divergence by Cédric Févotte and Jérôme Idier. Package scikit-learn uses their algorithm for module sklearn.decomposition.NMF. In section 4.1, they said

An MM algorithm can be derived by minimizing the auxiliary function $G(\mathbf{h} \mid \tilde{\mathbf{h}})$ w.r.t to $\mathbf{h}$. Given the convexity and the separability of the auxiliary function the optimum is obtained by canceling the gradient given by Eq. (36). This is trivially done and leads to the following update: $$ h_{k}^{\mathrm{MM}} = \tilde{h}_{k}\left(\frac{\sum_{f} w_{f k} v_{f} \tilde{v}_{f}^{\beta-2}}{\sum_{f} w_{f k} \tilde{v}_{f}^{\beta-1}}\right)^{\gamma(\beta)}. $$

The gradient in Eq. (36) is

This gradient depends on our choice of the decomposition of $\beta$-divergence. I don't get how the authors obtain such an explicit formula for $h_{k}^{\mathrm{MM}}$. Could you please elaborate on this issue?

Topic matrix-factorisation scikit-learn

Category Data Science

How do the authors get this updating formula for all $\beta$ in $\beta$-divergence

About