How is model evaluation and re-training done after deployment without ground truth labels?

Suppose I deployed a model by manual labeling the ground truth labels with my training data, as the use case is such that there's no way to get the ground truth labels without humans. Once the model is deployed, if I wanted to evaluate how the model is doing on live data, how can I evaluate it without sampling some of that live data, that doesn't come with ground truth labels, and manually giving it the ground truth labels? And then once evaluating its performance on that labeled live data sample, using that as the new training set as the new model. That's the only approach I can think of where you can't discern ground truth without human intervention, and it doesn't some too automated to me.

Is there any other way to do this without the manuel labelling?

Topic model-evaluations mlops training

Category Data Science


In your scenario there's no other way: the only way to properly evaluate on some live data is to have a sample of live data annotated.

However there are a few automatic things that can be done. Even though it's not a full evaluation it can give some indications about whether the model is doing as expected:

  • If the model is capable of measuring the likelihood of the data it receives, a decrease in this value means that the model has some difficulties.
  • Measuring how similar the distribution of the features and (predicted) target is between the training data and the live data. If very different, the model is likely to make mistakes.
  • Measuring any difference in the probabilities predicted by the model. Lower probabilities tend to correspond to lower confidence by the model.

There are probably other methods, in particular based on the specific task.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.