Neural network for MNIST: very low accuracy

I am working on solving the handwritten digit recognition problem by implementing a neural network. But the accuracy of the network is coming out to be very low, around 11% for the train dataset. I am not sure what is wrong with my program. I tried changing the learning rate and the number of hidden units, but no luck. Could anyone please take a look and help me out with what I am missing? I am pasting my Julia code below:

# install
Pkg.add("MNIST");
using MNIST

# training data
X,y = traindata(); 
m = size(X, 2);
inputLayerSize = size(X,1); 
hiddenLayerSize = 300;
outputLayerSize = 10;

# representing each output as an array of size of the output layer
eyeY = eye(outputLayerSize);
intY = [convert(Int64,i)+1 for i in y];
Y = zeros(outputLayerSize, m);
for i = 1:m
    Y[:,i] = eyeY[:,intY[i],];
end

# weights with bias
Theta1 = randn(inputLayerSize+1, hiddenLayerSize); 
Theta2 = randn(hiddenLayerSize+1, outputLayerSize); 

function sigmoid(z)
    g = 1.0 ./ (1.0 + exp(-z));
    return g;
end

function sigmoidGradient(z)
  return sigmoid(z).*(1-sigmoid(z));
end

# learning rate
alpha = 0.01;
# number of iterations
epoch = 20;
# cost per epoch
J = zeros(epoch,1);
# backpropagation algorithm
for i = 1:epoch
    for j = 1:m # for each input
        # Feedforward
        # input layer
        # add one bias element
        x1 = [1, X[:,j]];

        # hidden layer
        z2 = Theta1'*x1;
        x2 = sigmoid(z2);
        # add one bias element
        x2 = [1, x2];

        # output layer
        z3 = Theta2'*x2;
        x3 = sigmoid(z3);

        # Backpropagation process
        # delta for output layer
        delta3 = x3 - Y[:,j];
        delta2 = (Theta2[2:end,:]*delta3).*sigmoidGradient(z2) ;

        # update weights
        Theta1 = Theta1 - alpha* x1*delta2';
        Theta2 = Theta2 - alpha* x2*delta3';
    end
end

function predict(Theta1, Theta2, X)
    m = size(X, 2); 
    p = zeros(m, 1);
    h1 = sigmoid(Theta1'*[ones(1,size(X,2)), X]);
    h2 = sigmoid(Theta2'*[ones(1,size(h1,2)), h1]);
    # 1 index is for 0, 2 for 1 ...so forth
    for i=1:m
        p[i,:] = indmax(h2[:,i])-1;
    end
    return p;
end

function accuracy(truth, prediction)
    m = length(truth);
    sum =0;
    for i=1:m
        if truth[i,:] == pred[i,:]
            sum = sum +1;
        end
    end
  return (sum/m)*100;
end

pred = predict(Theta1, Theta2, X);
println("train accuracy: ", accuracy(y, pred));

Topic julia neural-network machine-learning

Category Data Science


What loss function are you using? It looks to me like you're using squared error loss (am I right?). This might work, but consider using cross entropy loss instead, which is more well suited for classification problems.

Also, by using the logistic function as activation function in the last layer, you're treating the problem as ten separate binary classification problems. While this also might work, since the problem is a multi-class classification problem, you should probably change the activation function in the last layer to softmax.

One error I spot in your code is that you're not taking into account the derivative of the nonlinearity in the last layer. To fix this, you should change

delta3 = x3 - Y[:,j];

to

delta3 = (x3 - Y[:,j]) .* sigmoidGradient(z3);

similar to the way you calculate delta2.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.