Neural network architecture to automatically crop a photo of a paper sheet

With an RGB image of a paper sheet with text, I want to obtain an output image which is cropped and deskewed. Example of input:

I have tried non-AI tools (such as openCV.findContours) to find the 4 corners of the sheet, but it's not very robust in some lighting conditions, or if there are other elements on the photo.

So I see two options:

  • a NN with input=image, output=image, that does everything (including the deskewing, and even also the brightness adjustment). I'll just train it with thousands of images.

  • a NN with input=image, output=coordinates_of_4_corners. Then I'll do the cropping + deskewing with a homographic transform, and brightness adjustment with standard non-AI tools

Which approach would you use?

For approach #1, it would be input=image, output=image, what kind of architecture of NN/CNN would you use?

Is approach #2, for which input=image, output=coordinates possible? Or is there another segmentation method you would use here?

Topic image-segmentation generative-models convolutional-neural-network neural-network

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.