Segmentation Network produces noisy output

I've implemented a SegNet and SegNet ReLU variant in PyTorch. I'm using it as a proof-of-concept for now, but what really bothers me is the noise produced by the network. With ADAM I seem to get slightly less noise, whereas with SGD the noise increases. I can see the loss going down and the cross-evaluation accuracy rising to 98%-99% and yet the noise is still there.

On the left is the actual image, then you can see the mask, and finally the actual output from the network. There's 1024 samples per class, and two classes, which are very consistent as the documents are very structured. I'm using the vanilla SegNet (same kernel, striding and padding) on 224x224.

What could explain this noise, and how could I potentially address the issue?

Topic image-segmentation convolution machine-learning

Category Data Science


I'm going to try and attempt to answer my question, but won't accept it as an answer, simply because I'm sure there is more than one reasons as to why this is happening. I've solved the issue by increasing the areas by adding more "features", e.g., I've made sure there is more text, table boxes and other visual features for the convolutions to "pick up". To a certain degree this seems to have helped a lot. What also helped, is using a more modern model, I've tried Unet with a resnet34 encoder and I also tried a DeepLabV3 which outperformed all others. So I suspect that the "noise" (for lack of a better word) is a by-product of the network being uncertain where exactly the boundaries of the segmentation are, due to absence of features. I suspect that the more modern models are better suited to deal with that problem.

EDIT

So far what seems to have worked for me is DeepLabV3 with Resnet encoders, but another thing that really seems to make a difference is how "large" the segmented area is. Obviously in some domains this is not possible to adjust (e.g., robotics, autonomous vehicles and suchs) but in document or text processing it most likely is. What I've noticed is that after the downsizing to 224x224 smaller and thinner areas become very hard if not impossible to learn, whereas larger and thicker areas are easier. I suspect that averaging CELoss might be a culprit here, and that there may be a loss mechanism which emphasises errors in smaller regions/areas where errors still exist. The hypothesis is that the smaller regions will tolerate errors due to the averaging operation whereas larger ones won't tolerate the errors as much.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.