What is the dimensionality of the bias term in neural networks?

Question

What is the dimensionality of the bias term in neural networks?

PyRsquared

2022年4月23日 06:59

I am trying to build a neural network (3 layers, 1 hidden) in Python on the classic Titanic dataset. I want to include a bias term following Siraj's examples, and the 3Blue1Brown tutorials to update the bias by backpropagation, but I know my dimensionality is wrong. (I feel I am updating the biases incorrectly which is causing the incorrect dimensionality)

The while loop in the code below works for a training dataset, where the node products and biases have the same dimension, but once I pass in a test example into the predict function, the dimensions do not match up and get an error. I have commented my code with the dimensions of the calculations of dot products between nodes and inputs.

Can someone help me understand what the dimensionality of the bias term should be, both in this particular case and in general, and how it should be added (row-wise, column-wise)?

Code:

def sigmoid(x, deriv=False):
    
    Activation function
    
    if(deriv==True):
        return (x*(1-x))
    return 1/(1+np.exp(-x))

# learning rate, hidden layer dimension, error threshold, dropout rate
alpha, hidden_size, threshold, drop_rate = (0.035,32,0.1,0.5)

# x_train and y_train are the training dataset and corresponding classes

# syn0 and syn1 are the synapses, weight matrices between layers (3 layers, 2 synpases)
syn0 = 2*np.random.random((x_train.shape[1],hidden_size)) - 1 # NxH
syn1 = 2*np.random.random((hidden_size,1)) - 1 # Hx1
b1 = np.random.random((x_train.shape[0],hidden_size)) # MxH
b2 = np.random.random((x_train.shape[0],1)) # Mx1

layer_2_error = 100*np.abs(np.random.random((y_train.shape[0],1))) - 1 # Mx1
avg_err = []
count = 0

while np.mean(np.abs(layer_2_error))  threshold:

    # Forward
    layer_0 = x_train # training dataset
    A = np.dot(layer_0,syn0) + b1 # MxN X NxH + MxH ~ MxH
    layer_1 = sigmoid(A)
    # drop out to reduce overfitting
    layer_1 *= np.random.binomial([np.ones((len(x_train),hidden_size))],1-drop_rate)[0] * (1/(1-drop_rate))

    B = np.dot(layer_1,syn1) + b2 # MxH X Hx1 + Mx1 ~ Mx1
    layer_2 = sigmoid(B)

    # Backprop
    layer_2_error = layer_2 - y_train # Mx1
    layer_2_delta = layer_2_error * sigmoid(layer_2,deriv=True) # Mx1 * Mx1 ~ Mx1

    layer_1_error = np.dot(layer_2_delta,syn1.T) # Mx1 X 1xH ~ MxH
    layer_1_delta = layer_1_error * sigmoid(layer_1,deriv=True) # MxH * MxH ~ MxH

    # update weights
    syn1 -= alpha*np.dot(layer_1.T,layer_2_delta) # HxM X Mx1 ~ Hx1
    syn0 -= alpha*np.dot(layer_0.T,layer_1_delta) # NxM X MxH ~ NxH

    # update biases
    b2 -= alpha*layer_2_delta # Mx1
    b1 -= alpha*layer_1_delta # MxH

    avg_err.append(np.mean(np.abs(layer_2_error)))
    if count % 500 == 0:
        print(Error after,count,iterations:,np.mean(np.abs(layer_2_error)))

    count += 1


def predict(x, w0, w1, b1, b2):
    
    Function to predict an output given a data x, weight matrices w1  w1 and biases b1  b2
    
    A = np.dot(x,w0) + b1 # mXN X NxH (+ MxH) ~ mxH
    layer_1 = sigmoid(A)
    B = np.dot(layer_1,w1) + b2 # mxH X Hx1 (+ Mx1) ~ mx1 (preds)
    layer_2 = B
    return (sigmoid(layer_2)  0.5).astype(int)

Topic backpropagation neural-network python

Category Data Science

tagoma · Accepted Answer · 2022年4月23日 06:59

As per the general case, the bias vector must have the same dimensions as the output vector.

Please, have a look at this excellent presentation:

In this example by M.Görner, there are 10 classes, so is bias dimension. Once inputs are multiplied by weights, the bias is added pointwise (it is 'broadcasted'). And that's pretty much it.

For those wondering the origin of 100 and 784. There are 100 training examples (images) and 784 features (total pixels/image).

What is the dimensionality of the bias term in neural networks?

Code:

About