Can a single-layer ANN get XOR wrong?

I'm still pretty new to artificial neural networks. While I've played around with TensorFlow, I'm now trying to get the basics straight. Since I've stumbled upon a course which explains how to implement an ANN with back propagation in Unity, with C#, I did just that.

While test-running the ANN with one hidden layer containing 2 neurons, I noticed, that it doesn't always get XOR right. No matter how many epochs it runs or how the learning rate was set. With some settings it happens more often than with other setting.

Usually I get something like this:

+---+---+------+
| 0 | 0 | 0.01 |
+---+---+------+
| 0 | 1 | 0.99 |
+---+---+------+
| 1 | 0 | 0.99 |
+---+---+------+
| 1 | 1 | 0.01 |
+---+---+------+

But in other occasions it looks more like this:

+---+---+------+      +---+---+------+      +---+---+------+
| 0 | 0 | 0.33 |      | 0 | 0 | 0.01 |      | 0 | 0 | 0.33 |
+---+---+------+      +---+---+------+      +---+---+------+
| 0 | 1 | 0.99 |      | 0 | 1 | 0.99 |      | 0 | 1 | 0.33 |
+---+---+------+  or  +---+---+------+  or  +---+---+------+
| 1 | 0 | 0.66 |      | 1 | 0 | 0.50 |      | 1 | 0 | 0.99 |
+---+---+------+      +---+---+------+      +---+---+------+
| 1 | 1 | 0.01 |      | 1 | 1 | 0.50 |      | 1 | 1 | 0.33 |
+---+---+------+      +---+---+------+      +---+---+------+

I've noticed that in every case, the sum of the outputs is ~2. It also doesn't happen most of the time but still quite often. Depending on what settings I use it happens every two or three runs, or it happens only after 10 or 20 runs. For me it seems more like a mathematical quirk in the stochastic nature of neural networks. But I'm not good enough with math to actually figure this one out by myself.

The question is: Assuming the implementation is as simple as possible, with no advanced concepts, is it likely for something like this to happen or is it definitely an error in the implementation? If it's not an error in the implementation, what is going on here? Is it because of the very symmetrical nature of an XOR? Which is the reason a single neuron can't handle it, as far as I understood.

I know I could post the source code as well, but I already double and triple checked everything, since I had a mistake in it with the bias calculation. Back then the values were completely off all the time. Now I'm just wondering if this sort of thing could actually happen with a correct implemented neural network.

Topic homework mathematics beginner neural-network machine-learning

Category Data Science


Assuming the implementation is as simple as possible, with no advanced concepts, is it likely for something like this to happen or is it definitely an error in the implementation?

In my experience, using the simplest possible network, and simplest gradient descent algorithm, then yes this happens relatively frequently. It is an accident of the starting weight values, and technically a local minimum of the cost function, which is why it is so stable when it happens. In the basic implementation you have then there are only 6 starting weights. If they are selected randomly, the chances of a "special" pattern (such as the weights to hidden layer being all positive or all negative) are relatively high (1 in 8 for all positive or all negative weights between input and first hidden layer).

This is also why the values sum to 2 - given the the network is stuck on the wrong part of the error surface, it will still minimise the cost function as best it can given that constraint, and this will usually end up with compromise values that still meet statistical means overall in the predictions. If you doubled up some, but not all of the input/output pairs (e.g. a training set of 6 inputs $\{(0,0:0), (0,1:1), (1,0:1), (1,1:0), (0,1:1), (1,0:1)\}$, then the network may converge to different wrong mean value when it failed.

Is it because of the very symmetrical nature of an XOR, which makes it impossible to handle for a single neuron?

You don't have a single neuron here. Unless you mean in the output layer? In which case, no, this is not to do with having a single neuron in the output layer.

Pretty much any more advanced NN feature, or simply more randomness, will stop this problem happening. E.g. make the middle layer have 4 neurons instead of 2, use momentum terms, a larger dataset with random "mini-batches" sampled from it.

In general this kind problem does not seem to happen on larger, more complex datasets and larger more complex networks. These can have other problems, but getting stuck in a local minimum far away from the global minimum tends not to happen. In addition for those scenarios, you typically don't want to converge fully into a global minimum for your dataset and error function, but are looking for some form of generalised model (that can predict from input values that you have not seen before).


On a practical note, if you want to add an automated test showing your NN implementation can solve XOR, then either use fixed starting weights or a RNG seed that you know works. Then your test will be reliable, even if the NN is not in all cases.


Is it because of the very symmetrical nature of an XOR, which makes it impossible to handle for a single neuron?

Yes because the XOR problem is not linearly separable. Using a single layer MLP, you can only draw linear separation boundaries between the samples. I suggest that you read this post:

https://medium.com/@jayeshbahire/the-xor-problem-in-neural-networks-50006411840b

If you want to represent non-linear decision boundaries with a MLP-ANN, you should add more hidden layers; simple as that!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.