Need help understanding how this Neural Network is working

This is a model I came across, and I need some help understanding how it works

It uses South German Credit Prediction data set from Kaggle

!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00573/SouthGermanCredit.zip
with zipfile.ZipFile('SouthGermanCredit.zip', 'r') as zip_ref:
    zip_ref.extractall('./SouthGermanCredit/')

from tensorflow.keras import regularizers

batch_size=32
learning_rate=1e-3
trainX, testX, trainY, testY = train_test_split(features, labels, test_size=0.2, random_state=69)

normalizer = preprocessing.Normalization()
normalizer.adapt(np.array(trainX))

model = tf.keras.Sequential([
      normalizer,
      layers.Dense(128, activation='elu', kernel_regularizer=regularizers.l2(0.01)),
      layers.Dropout(0.5),
      layers.Dense(128, activation='elu', kernel_regularizer=regularizers.l2(0.01)),
      layers.Dropout(0.5),
      layers.Dense(2),
      layers.Softmax()])

model.compile(optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

model.fit(trainX, trainY, epochs=50, verbose=0, batch_size=batch_size)
test_loss, test_acc = model.evaluate(testX, testY, verbose=2)
dnn_predictions = model.predict(testX)

my questions are:

  1. What is normalizer doing in the sequential model
  2. In layers.Dense, what is the shape of the input and the shape of the output?
  3. What does layers.Dropout(0.5) mean and do?
  4. What does model.compile do? and why do we need it. This github link also implements the same model, but there is no model.compile anywhere. why?

UPDATE

After i run model.summary(), this is what is get

Model: sequential_16
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
normalization_16 (Normalizat multiple                  41        
_________________________________________________________________
dense_48 (Dense)             multiple                  2688      
_________________________________________________________________
dropout_32 (Dropout)         multiple                  0         
_________________________________________________________________
dense_49 (Dense)             multiple                  16512     
_________________________________________________________________
dropout_33 (Dropout)         multiple                  0         
_________________________________________________________________
dense_50 (Dense)             multiple                  258       
_________________________________________________________________
softmax_16 (Softmax)         multiple                  0         
=================================================================
Total params: 19,499
Trainable params: 19,458
Non-trainable params: 41

I dont not know what this means

Topic keras tensorflow kaggle neural-network python

Category Data Science


  1. Make each feature have a mean of 0 and standard deviation of 1. In ML modeling we usually (not always) need this to make sure all features have a comparable size so that one feature will not be playing a bigger role just because of its larger scale. And additional need of NN in this is to avoid gradient explosion. Reference.

  2. You can use model.summary() to show the sizes of the parameter. From your code, the output size of the $1^{st}$ and $2^{nd}$ Dense are 128, input size of the $2^{nd}$ and $3^{rd}$ Dense are 128, output size of the $3^{rd}$ Dense is 2.

  3. Dropout is a regularization technique which takes away a certain percentage of NN units of its previous layer in the case of your code. The precentage is 50% (=0.5) in your code.

  4. From here you see what you can and need to specify with model.compile. Your code uses all the Keras' dedicated function calls to complete the model, but the link's code defined for itself (e.g.) how to operate the training. You can reproduce your code in the link's way to do the exact same thing, but why the extra work? You want to produce code like that in the link when the kera, ready and dediciated function calls cannot satisfy your needs. If you look at code block #10 in that link, it specified the optimizer, loss function and metric that you will need in calling the model.compile.

UPDATE

How to read the #Params in the summary.

2688 = 20*128 + 128

16512 = 128*128 + 128

258 = 128*2 + 2

you can read the above as this (since they are all dense layers): $NumParam = InputSize \times OutputSize + OutputSize$

You have the term $+ OutputSize$ because your dense layer has bias enabled (by default).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.