CNN application assessment

I would be glad if someone could give me some hints and assessment for the following project. (I'm relatively new to ML and DL and having only a little theoretical knowledge)

My goal is to build a detector for receipt corners in images. I started to create a dataset with images of the receipts with the labels being the 4 corner points of the receipt.

My plan is to train a CNN with the dataset and I wonder if you could give me an estimation on how much images I would need in my dataset to successfully train it (will it be a few hundred or several thousand)? Would this be a quite simple task for the network or either complex due to the large amount fo pixels in the images?

Edit: (Thanks for your answers so far!)

  • My data is an image with a list the corner points of the receipt [[x, y], [x, y], [x, y], [x, y]]
  • I'm planning to use a NN to output me these 4 corner points
  • In the next step the background shall be cropped using these 4 points

I started using a pre-trained ResNet18 using pytorch and got stuck with the following questions, as the task differs from the basic classification tutorials I found so far:

  • How do I need to transform the label vector with the 4 corners?
  • How does the output look like?
  • Do I need to use a FCN for this task as its a kind of segmentation task?

Topic cnn image-recognition deep-learning

Category Data Science


It's impossible to say how many images you'll need without knowing more details. You might be able to train a good performing model with less than a hundred images depending on the diversity in your data and complexity of your use case.

In general, machine learning applications are a mix of software and data. Getting the data right is more important than the code itself, though. It's fairly easy to train a model by now. There are tons of tutorials on how to write the code or tools which you can use for free (I use hasty.ai personally) to prototype quickly. Using these tools is especially convenient when you're new to machine learning, as it reduces the complexity a lot. Creating the right dataset is something you need to do your own, though.


To train a CNN model with dataset images be it in hundreds or thousands, you need a good processing capacity or infrastructure on your system. With normal configuration say 4 GB RAM, you can go ahead and train the model with hundreds of images and it works fine and you can validate and test your data as well. Alternately, you can also try Google colaboratory for your CNN model algorithm by connecting it to hosted runtime, as it assigns 12GB RAM to individual user.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.