Applying activation on part of the layer in Keras

Context

I am trying to implement the YOLO algorithm in Keras. What I have so far is the following network:

i = Input(shape=(image_height,image_width, image_channels))
rescaled = Rescaling(1./255)(i)
x = Conv2D(16, (1, 1))(rescaled)
x = Conv2D(32, (3, 3))(x)
x = LeakyReLU(alpha=0.3)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(16, (3, 3))(x)
x = Conv2D(32, (3, 3))(x)
x = LeakyReLU(alpha=0.3)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(256, activation='sigmoid')(x)
x = Dense(grid_width * grid_height * anchor_number * (5 + class_count))(x)
x = Reshape((grid_width, sgrid_height, anchor_number, (5 + class_count)))(x)

Which supposed to output, for each grid cell and anchor box, a vector of $(p(c), b_x, b_y, b_h, b_w, class_0, class_1, ..., class_n))$

The problem

Some of the output vectors, namely $p(c)$, $b_x$ and $b_y$, are limited to be between 0 and 1, so they should pass through a sigmoid activation. The part of $class_0, class_1, ..., class_n$ is a classification, so it should pass through a SoftMax activation. So I need a way to specify what part of the output needs to use which activation.

TL;DR: How do I apply different activation functions to different parts of the network output?

Topic activation-function keras neural-network

Category Data Science


YOLO architecture uses the softmax activation function determining the classes of objects in bounding boxes in the output layer. But from the code you have shared, sigmoid is used in the last layer for prediction. It seems that you're trying to implement YOLOv3.

Please look at this paper original paper. This will help to give a better understanding.

Authors of YOLOv3 have refrained from softmaxing the classes since the method rests on the assumption that classes are mutually exclusive. For example, if there are classes like “cat” and “animal” in the dataset and one of the objects in the bounding boxes is a cat, this assumption fails, because a cat is also an animal. Instead, independent logistic classifiers predict each class score and a threshold is used to perform multilabel classification for objects detected in images. An element belonging to a certain class will not be influenced by the decision of that element belonging to another class (binary cross-entropy loss).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.