the size of training data set in the context of computer vision

Question

the size of training data set in the context of computer vision

user288609

2022年5月17日 04:04

Generally speaking, for training a machine learning model, the size of training data set should be bigger than the number of predictors. For a neural network, or even a deep learning model, the number of parameters are usually tens of thousands or even millions. It seems that in practice, the number of training data set, i.e., the number of images, is usually less than the number of parameters. How to explain this? I know, we can claim that the pre-trained model may remove the requirement of having that many images. Is this the only reason, or we should use number of pixels multiplied by the number of images to measure the size of training data set.

Topic image-recognition image-classification computer-vision deep-learning neural-network

Category Data Science

Dave Kielpinski · Accepted Answer · 2020年4月2日 01:36

Your second hypothesis is on the right track. Try comparing the information content of the training set with the information content of the network parameters. Of course most images are compressible, but they don't compress down to a single floating-point number, which is how network parameters are usually encoded.

the size of training data set in the context of computer vision

About