Dealing with historic data drift
I'm trying to predict a continuous target in an industrial context.
The problem I'm facing is that the some of the predictors have changed over time, for example the pressure in the machine was increased. This influenced some of the other predictors, but hasn't influenced my target.
As an example (in R formula notation):
$ Y \sim U_1$ The target depends on some unobservable variable
$ X_j ~ \sim U_1 + X_i$ One of my observed variables depends on the unobservable varible and another observed variable. Therefore $X_j$ is helpful for predicting $Y$
Now $X_i$ has changed a couple of times. This clearly hasn't influenced my target. But I'm also not really able to learn the relation $Y \sim X_j$ because $X_j$ has changed with $X_i$.
I know some of these dependencies for fact from physics. But there's no way I can fix this by hand because there are about 1000 variables.
When reading about data drift, it's always about how to adjust your existing model to a sudden change, but for me the changes already happened in the past.
The time periods where nothing has changed are too small to just use the latest batch, but it seems like just using the whole dataset without any adjustments doesn't work either.
Can anyone give advice on how to address this?
(Right now I'm using XGBoost but I'm open to other models)
Topic concept-drift
Category Data Science