Incorporating data over time into lightgbm

Question

Incorporating data over time into lightgbm

user1777900

2022年2月7日 22:10

So I'm in the situation where I know what it is I'm trying to find, but not the terminology for it and I think that's why a lot of my google searches are directing me in the wrong direction, so apologies if some of this explanation ends up redundant.

Essentially, I want to be able to incorporate historical trends into the lightgbm model I've been using. Basically I have a bunch of categorical health data currently but by default, currently the model only checks each value in isolation. For examples, let's take blood pressure (BP). Currently, the model only gets the single BP measurement per row. So while it knows the historical values in its training set, it doesn't (as far as I understand) take into account trends (like BP going up over the last 5 encounters)

A lot of my preliminary research directed me towards using time series forecasting, but that's not quite what I want. I'm not trying to extrapolate what the next blood pressure measure is based on the trend (which WOULD be time series forecasting), I'm trying to incorporate 'blood pressure is trending upwards, thus the person is more at risk'.

I've read about converting time series into tabular and lagged values, but all so far seem to be to predict the next value in a time series, rather than being able to incorporate trends over time AS a feature. I think the right approach might be something like, changing BP from a single value into a window of values (bp over the last 5 encounters for example) for each row, but that's me guessing from cobbling together what I've read.

Or perhaps I'm just misunderstanding and this isn't the right approach at all. Or if lightgbm isn't suited for this (I know xgboost has problems with extrapolation and it's in a similar gradient boosting family)

Topic lightgbm time-series python

Category Data Science

Peter · Accepted Answer · 2022年2月7日 22:10

LightGBM and XGBoost use essentially the same type of model, namely boosting. Usually, boosting is „tree based“, meaning that (shallow) regression/classification trees are fitted successively to the residual (see here and here for some posts on boosting).

Tree based models are often „bad“ in incorporating (linear) time trends. They tend to „ignore“ or misinterprete time trends encoded as „dummy“. Adding time trends as integers (t=1, 2, 3) may work (as a monotonic increasing value).

However, truly linear trends are usually well recognized by linear models. So combining (stacking) linear models and (tree based) boosting could be an option.

Having a „window“ of BP would be essentially the same thing as a lag (so add BP of last period and of period before that etc.). This should work well also with (tree based) boosting. So you have BP as a feature plus (in addition) several features of lagged BP. I suggest to try this in boosting and compare it to a linear model. However, one or two „lagged“ BP values should do it, I guess.

Incorporating data over time into lightgbm

About