What features used by CNN model should a feature store actually store?

According to MLOPs principle, it is recommended to have a feature store. The question is in the context of doing image classification using deep learning models like convolutional neural networks which does automatic feature engineering(using convolution layers) as part of the training process.

Questions

  1. Does it make sense to have a feature store for pure image classification/segmentation models?
  2. What features should be stored in the feature store? Output of convolution layers? but then they cannot be reused during the training since during training these will be rebuild by the convolution layers.

Topic mlops feature-engineering convolution image-classification deep-learning

Category Data Science


Disclaimer: i am one of the authors of the referenced link.

  1. If you are doing pure image classification/segmentation, I would say no.
  2. I would not store the output of convolution layers, noramlly. In principle, the output of models can be stored in the feature store, however.

Feature stores store data in tabular file formats (like parquet) and are used to store both immutable data (like images) and features that change over time (number of clicks on the website in the last 1 hour). These latter historical/contextual features need to be constantly updated by feature pipelines. Typically, you cache them in an online feature store, so that your model can retrieve them to build feature vectors. You could store image data in the feature store, but I would normally just store the path to an image (on a distributed file system or object store), as well as all the metadata about the image that i need to make predictions. By storing image paths, it is easy to retrieve them to build training datasets in file formats designed for efficient training of DNNs on image data, like Petastorm.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.