Can I compare two models trained on different but similar datasets to help find differences between the two datasets?
I have a multivariate dataset the contains A and B. I want to see if there are differences between the A and B samples. I currently have two ideas on how to do this, but I am not sure if they are valid.
Train a model on A's samples and separately train a model on B's samples and compare the regression coefficients.
Train a model with A's samples and compare the errors of a holdout of A's and all of B's samples and see where the errors differ.
The thought process behind the first solution is that if the two datasets are similar, the regression coefficients calculated will also be similar. My thinking on the second solution is that if the two datasets are similar enough, the model trained on A should do a relatively good job predicting B. I could then do some statistical tests confirming that the errors of B are not significantly different than the holdout of A.
Does this line of thinking make sense? Would the difference in coefficients and error signify anything about the differences in the datasets? I have a feeling that this is not feasible since I have not been able to find examples of this anywhere.
Topic methodology regression statistics machine-learning
Category Data Science