How to properly train your Self-Organized Map?

I recently stumbled upon the Self-Organized Map - an ANN architecture used to cluster high dimensional data - while simultaneously imposing a neighborhood structure on it. It is trained through a competitive learning approach where neurons compete to respond to a given input. The strongest responding neuron / best matching unit (BMU) is rewarded by being moved closer to the given input in the data space, as well as its neighbors. However, within the literature and implementations, I find some deviations in how this training is implemented. Specifically, the influence of the BMU on its neighbors are mitigated using a neighborhood function:

$$\beta_{ij}(t)=exp \bigg({{-d^2}\over{2\sigma^2(t)}} \bigg), \ where \ t=1,2,3...n$$

where $d$ is the distance of the BMU to the input and $\sigma(t)$ is a radius that is decreased during the training. Effectively, resulting in the influence of the readjustment of the BMU on its neighborhood shrinking during training. The difference I find concerns the implementation of the shrinking of $\sigma(t)$. Most explanations and blog posts describe an exponential decrease:

$$\sigma(t) = \sigma_0 \cdot exp \bigg({{-t}\over{\lambda}} \bigg), \ where \ t= 1, 2, 3...n$$

where $\lambda$ is a decay constant that can be tuned arbitrarily. Alternatively, I find that some implementations do not really use this exponential decay, but instead use linear interpolation of the form:

$$\sigma(t)=r(2)+{{n-t}\over{n}} \cdot [r(1)-r(2)]$$

where $n$ is the number of training epochs and $r$ is the radius which is altered depending on the training phase. These implementations further explicitly between a 'rough' training phase where:

$$\vec{r}= \bigg( \begin{array}{c} 1 \\ 0.1 \end{array} \bigg) \cdot max(SOM.dims)$$

with e.g. SOM.dims=(100,100) being for a $100x100$ sized SOM, and a 'fine-tuning' training phase where:

$$\vec{r}= \bigg( \begin{array}{c} {0.1 \cdot max(SOM.dims)} \\ 0.1 \end{array} \bigg)$$

My problem is that I do not quite understand why there seems to be this disagreement and what the 'canonical' way of training a SOM is. It certainly makes sense to divide the training into a 'rough' and a 'fine-tuning' phase, but why most newer descriptions neglect this without further discussion and only consider a single training phase with exponential decay is baffling me a bit.

Topic ann unsupervised-learning neural-network clustering

Category Data Science

An answer from Kohonen, inventor of the self-organized map himself:

"The true mathematical form of σ(t) is not crucial, as long as its value is fairly large in the beginning of the process. Say, on the order of half of the diameter of the grid, whereafter it is gradually reduced to a fraction of it in about 1000 steps."

From: Kohonen, T., 2013. Essentials of the self-organizing map. Neural networks, 37, pp.52-65.


Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.