Training seems to be plateauing at every learning rate

Question

Training seems to be plateauing at every learning rate

John Sohn

2021年10月11日 16:17

So firstly I have a network that I'm using to approximate the value of a function. Recently, at about 50000 trains, it began to show no further advancement in training, at any learning rate. The question is what design or training flaw could this be a symptom of ?

To track progress after I train each value on each epoch, I run all the inputs from the training data through the model immediately after each individual backpropagation. I subtract the expected value supplied with the training data (the computed value of the function) and I sum the absolute value of the differences up. This is based off the idea that the total sum will get closer and closer to 0, the better trained the model is. Here I(x) is the inferenced value from training inputs x, and F(x) is the actual expected value.

Eg.

$$D(x) = \sum_{i=1}^{n} | F(x_i) - I(x_i) |$$

As training progresses through each batch, I pay particular attention this sum after the very first training value and after the very last. This shows the overall direction the model went when one is subtracted from the other.

So when $$D(x_{End}) - D(x_{Start}) 0 $$ values are getting closer to the correct ones, when positive they are drifting further away overall. This corresponding to the start and end values in the script output below summarizing the changes per training epoch.

After about 50,000 trains this value ceased descending. Instead of approaching 0, even slowly, it began to increase slowly. Previously I tried to compensate by trying to overfit values that created a decrease in output values across the whole dataset, and found this was not successful, such not being the case here.

As you can see in Figure's 1 and 2 below, which pictures a matplotlib of the model output as green dots, against the contour of the model, the data is roughly begining to fit the curve, but if you note the scale there is still a very long way until it is usable. For reference I also included some of my script output.

The neural network is laid out in pytorch as follows:

  self.linear_relu_stack = nn.Sequential(
            nn.Linear(2, 1024),
          #  nn.Hardtanh(),
            nn.Linear(1024,1024),
            nn.Linear(1024,1024),
            nn.Linear(1024,1),
            nn.Sigmoid()
            )

I have been adjusting the learning rate manually. At present writing I dropped it to lr=0.0000001

I also have adjusted the momentum from 0.9 to 0.5, loss is still occurring.

Figure 1 Figure 2

Script Output:

Last Epoch Differences (p - e) Direction: 8.067116141319275e-06 Start: 22.35144703007279 End: 22.351455097188932 Mean: 22.35145098814251 SD: 2.4139466179864307e-06 Min: 22.351446987231952 Max: 22.351455233162028 Training 66400 of 400 total trains 66400 total epochs 166 On epoch 1 on index 399 normalized counts {-3: 0, -2: 102, -1: 107, 0: 92, 1: 99, 2: 0, 3: 0}

Last Epoch Differences (p - e) Direction: 7.612630724906921e-06 Start: 22.351455067386603 End: 22.351462680017328 Mean: 22.351458123493693 SD: 2.5673223508895607e-06 Min: 22.351454374482607 Max: 22.351462741484617 Training 66800 of 400 total trains 66800 total epochs 167 On epoch 2 on index 399 normalized counts {-3: 0, -2: 84, -1: 146, 0: 93, 1: 51, 2: 26, 3: 0}

Last Epoch Differences (p - e) Direction: 8.674338459968567e-06 Start: 22.351462620412676 End: 22.351471294751136 Mean: 22.35146532604446 SD: 2.7537054590466787e-06 Min: 22.35146138547894 Max: 22.351471432586877 Training 67200 of 400 total trains 67200 total epochs 168 On epoch 3 on index 399 normalized counts {-3: 0, -2: 74, -1: 145, 0: 113, 1: 66, 2: 2, 3: 0}

Last Epoch Differences (p - e) Direction: 7.897615432739258e-06 Start: 22.351471348767863 End: 22.351479246383295 Mean: 22.351474757627347 SD: 2.232492475207527e-06 Min: 22.35147131896554 Max: 22.351479350691424 Training 67600 of 400 total trains 67600 total epochs 169 On epoch 4 on index 399 normalized counts {-3: 4, -2: 45, -1: 176, 0: 101, 1: 74, 2: 0, 3: 0}

Last Epoch Differences (p - e) Direction: 8.223578333854675e-06 Start: 22.351479380493736 End: 22.35148760407207 Mean: 22.35148368777217 SD: 2.314186504137269e-06 Min: 22.35147900982735 Max: 22.351487760534262

Topic cost-function neural-network

Category Data Science

Training seems to be plateauing at every learning rate

About