Can a single-layer ANN get XOR wrong?
I'm still pretty new to artificial neural networks. While I've played around with TensorFlow, I'm now trying to get the basics straight. Since I've stumbled upon a course which explains how to implement an ANN with back propagation in Unity, with C#, I did just that.
While test-running the ANN with one hidden layer containing 2 neurons, I noticed, that it doesn't always get XOR right. No matter how many epochs it runs or how the learning rate was set. With some settings it happens more often than with other setting.
Usually I get something like this:
+---+---+------+
| 0 | 0 | 0.01 |
+---+---+------+
| 0 | 1 | 0.99 |
+---+---+------+
| 1 | 0 | 0.99 |
+---+---+------+
| 1 | 1 | 0.01 |
+---+---+------+
But in other occasions it looks more like this:
+---+---+------+ +---+---+------+ +---+---+------+
| 0 | 0 | 0.33 | | 0 | 0 | 0.01 | | 0 | 0 | 0.33 |
+---+---+------+ +---+---+------+ +---+---+------+
| 0 | 1 | 0.99 | | 0 | 1 | 0.99 | | 0 | 1 | 0.33 |
+---+---+------+ or +---+---+------+ or +---+---+------+
| 1 | 0 | 0.66 | | 1 | 0 | 0.50 | | 1 | 0 | 0.99 |
+---+---+------+ +---+---+------+ +---+---+------+
| 1 | 1 | 0.01 | | 1 | 1 | 0.50 | | 1 | 1 | 0.33 |
+---+---+------+ +---+---+------+ +---+---+------+
I've noticed that in every case, the sum of the outputs is ~2. It also doesn't happen most of the time but still quite often. Depending on what settings I use it happens every two or three runs, or it happens only after 10 or 20 runs. For me it seems more like a mathematical quirk in the stochastic nature of neural networks. But I'm not good enough with math to actually figure this one out by myself.
The question is: Assuming the implementation is as simple as possible, with no advanced concepts, is it likely for something like this to happen or is it definitely an error in the implementation? If it's not an error in the implementation, what is going on here? Is it because of the very symmetrical nature of an XOR? Which is the reason a single neuron can't handle it, as far as I understood.
I know I could post the source code as well, but I already double and triple checked everything, since I had a mistake in it with the bias calculation. Back then the values were completely off all the time. Now I'm just wondering if this sort of thing could actually happen with a correct implemented neural network.
Topic homework mathematics beginner neural-network machine-learning
Category Data Science