Combining image and scalar inputs into a neural network

Question

Combining image and scalar inputs into a neural network

Vladislav Isaev

2022年3月30日 18:07

I'm looking at the best way of combining CNN with image input and a scalar value.

I know that one of the ways is to concatenate flatten layer with this scalar value. But flatten layer consist for example 2048 such scalar values with different magnitude than a single input value. And what if in a real task this scalar value has more influence than image. Also one of the examples is a combination of a text and image and then some fusion on top of that, but I still think it is a little different task because you get pretty the same vectors from the text model and CNN network. Another one solution is to apply some ml algorithms, like Xgboost on top of flatten layer from CNN and this scalar value. But in that case, we need to train CNN networks separately, which is not good.

Can someone tell what is the best way to combine image input with scalar value so that I can train CNN network together with scalar input and that network will "decide" which input more important?

Topic cnn ensemble-modeling deep-learning neural-network machine-learning

Category Data Science

Brian Spiering · Accepted Answer · 2021年10月29日 14:36

There are many options. One of the most common is to input the image into a Convolutional Neural Network (CNN). Then after the CNN layers have a full interconnected dense layer. The scalar values are concatenated at that full interconnected dense layer. Additional full interconnected dense layers can be added after.

The deep learning model will automatically learn to weigh the importance of the scalar and image values through the training process.

Combining image and scalar inputs into a neural network

About