Multilayer perceptron does not converge

I have been coding my own multi layer perceptron in MATLAB and it compiles without error. My training data features, x, has values from 1 to 360, and the training data output, y, has the value of $\sin(x)$.

The thing is my MLP only decreases the cost for the first few iterations and will get stuck at 0.5. I have tried including momentum, but it does not help and increasing the layers or increasing the neurons does not help at all. I am not sure why this is happening.

I have uploaded the files for your reference here.

The summary of my code is:

  1. I normalize my input data either using min-max or zscore

  2. Initialize random weights and bias within the range of -1 to 1

    for i = 1:length(nodesateachlayer)-1    
     weights{i} = 2*rand(nodesateachlayer(i),nodesateachlayer(i+1))-1; 
     bias{i} = 2*rand(nodesateachlayer(i+1),1)-1; 
    end
    
  3. Then, I do a forward pass where the input is multiplied by weights and added with the bias and then activated by a transfer function (sigmoid)

    for i = 2:length(nodesateachlayer)
         stored{i} = nactivate(bsxfun(@plus,(weights{i-1}'*stored{i-1}),bias{i-1}),activation);    
     end
    
  4. Then calculate the error then do a backward pass

    dedp = 1/length(normy)*error;
     for i = length(stored)-1:-1:1
         dpds = derivative(stored{i+1},activation);
         deds = dpds'.*dedp;
         dedw = stored{i}*deds; 
         dedb = ones(1,rowno)*deds;
         dedp = (weights{i}*deds')';
         weights{i}=weights{i}-rate.*dedw;
         bias{i}=bsxfun(@minus,bias{i},rate.*dedb');
     end
    
  5. I have the cost plotted out at every iteration to see the descent

I assume there is something wrong with the code so where could the error possibly lie in it?

Topic mlp gradient-descent neural-network

Category Data Science


My view on your question, is that tiny networks seldom work. The above method uses a Neural Network to learn the function $y=\sin(x)$. Although this problems seems simple, it cannot be expected to be solved by a really tiny network (the above model uses a 5-layer MLP with hidden size [5,6,7], which is small).

Even if back-propagation is implemented correctly, would the model learn anything? No. I suppose Tensorflow implemented back-propagation correctly, here is the result using Tensorflow: fig1

You see, it learns almost nothing. In fact, the MSE loss is very close to 0.5 as stated above.

My suggestion is to try a 3 layer MLP with hidden size 256. Here is the result:

fig2

You can see it's much better. MSE<0.1 now.

------------------code---------------------

x_ =np.atleast_2d(np.arange(0,360,1)).T
y_ = np.atleast_2d(np.sin(x_/180*np.pi))
g = tf.Graph()
with g.as_default():
    with tf.variable_scope("mlp"):
        input_x = tf.placeholder(shape=[None, 1], dtype=tf.float32)
        input_y = tf.placeholder(shape=[None,1], dtype=tf.float32)
        layer1 = tf.layers.dense(inputs=input_x, units=256, activation=tf.nn.sigmoid)
        #layer2 = tf.layers.dense(inputs=input_x, units=6, activation=tf.nn.sigmoid)
        #layer3 = tf.layers.dense(inputs=input_x, units=7, activation=tf.nn.sigmoid)
        output_y = tf.layers.dense(inputs=layer1, units=1) # inputs=layer1
        loss = tf.losses.mean_squared_error(input_y, output_y)
        train_op = tf.train.AdagradOptimizer(0.01).minimize(loss)
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for epoch in range(300):
            _, loss_ = sess.run((train_op, loss), feed_dict={input_x:x_, input_y:y_})
        y_hat_ = sess.run(output_y, feed_dict={input_x:x_, input_y:y_})
        print(loss_, end='\t')
plt.plot(x_,y_, 'g', x_,y_hat_,'b')
plt.legend(['ground truth', 'predicted'])

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.