Extract segment from document scan

I need to extract some "valuable" information from document scan. For example, document's number, incoming date, organizations, persons, etc.

Example document:

I'm trying to extract highlighted segment of the document. Original scan doesn't have that highlighting. And value can be handwritten or typewritten.

I tried U-Net and Mask RCNN for my dataset (~100 examples). Without any success.

Any ideas?

Topic faster-rcnn cnn

Category Data Science


Priviet, feeper!

I created some simple program to extract data from documents. Works pretty well.

https://gist.github.com/fuwiak/780cb4abbe01aa5d1438269dfa0a3cfc

Best

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.