ML model deployment architecture?

I came from a software development background and we have separate servers of the same database (dev, test, prod). The reason for this is because we develop our apps against the dev DB, run tests against the Test DB, and prod is prod. This is so we create a clear separation and won't bring down prod trying to build our app.

Do you guys train your models the same way? Have 3 environments of the same database and as your model goes from dev to test to prod, it trains against the corresponding environment?

Example:

  1. Data scientist plays around with a 3 different algos for classification. Creates 3 models (A, B, C) using dev env's database.

  2. Data scientist evaluates 3 models and selects Model A after testing/validation.

  3. Data scientist deploys the code to TESTING/STAGING env (same hyperparameters). This time, however, the model trains using TESTING/STAGING env's databases. A version of Model A is created using data from the TESTING/STAGING databases.

  4. Data scientist deploys the code to PROD env (same hyperparameters). This time, however, the model trains using PROD env's databases. A version of Model A is created using data from the PROD databases.

Topic machine-learning-model data-product databases machine-learning

Category Data Science


Just treat the model as you would treat an image or other resource in your project. Would you test images at staging too?


No, you don't train against 3 separate databases, because for each dataset, you would need to do the whole work from scratch ( exploratory data analysis , feature engineering etc..). You need a valid training dataset, from which you create your validation dataset and test dataset ( a hold-out set as a final verification step ). Yeah sure, you deploy your model for example as a webservice on a test server to make sure is working as expected. But model verification, including avoiding overfitting or underfitting, having a good accuracy and all that is done prior to any deployment.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.