Repeatability tests for machine learning models (in the sense of measurement system analysis)
For analyzing a machine learning model, we usually calculate the model performance metrics (such as accuracy...) and during validation step make sure that the model has not overfitted. We can consider a machine learning model (for example a machine vision model) that is deployed to an industrial system that performs a classification (e.g., defect detection) task as a measurement device.
From this point of view, I would like to know if performing measurement system analysis and specifically repeatability are necessary. Repeatability in the context of the machine vision example, means taking multiple images from the object of interest and to expect that for all the images the prediction be the same. This will increase the confidence that the model is not sensitive to the slight variations that can happen for any reason in the whole pipeline. In my opinion testing for repeatability, directly checks if the model has overfitted as far as small changes in input are concerned, however does not guaranty that the model can generalize if the input data is sufficiently varied (so cannot guaranty that the model has not overfitted).
What properties of model (bias, variance, overfitting,...) are analyzed by Repeatability? and do usual validation steps (using train+validation subsets) cover those properties?
I should mention that by repeatability, I don't mean to train a model multiple times and trying to get to exactly the same model (for example by trying to fix the random seeds, etc.) as I have seen that on some webpages repeatability is used in this meaning.