How to update the posterior belief when we are observing a stream of correlated data from a fixed but unknown data source

Question

How to update the posterior belief when we are observing a stream of correlated data from a fixed but unknown data source

Moh

2022年4月19日 14:01

I want to build a [probabilistic] model that aims to infer the true value of an unknown categorical variable, $y \in \{1,2,..., K\}$.

We have a dataset $(X,y): \mathbb{R}^d\rightarrow \{1,2,..., K\}$ and we can train a classifier that gives $d$-dimensional data, $X$, and estimates the output $y$.

Now, suppose that $X$s are correlated and all coming from a fixed $y$. I mean, we are observing $X^1, X^2,...., X^T,...$ over time and we know that $y$ is fixed for all of them.

For example:

We receive $X^1$ (at time $t=1$) and our previously trained classifier produces a guess about $\hat{y}^1$.
Then, we receive $X^2$, and we again use the classifier to guess $\hat{y}^2$.
Then, we receive $X^3$, and so on.

So, at time $t=T$ we have $\hat{y}^1, \hat{y}^2, ..., \hat{y}^T$.

Now, the question is: How can I make a model to use these estimations ($\hat{y}^1, \hat{y}^2, ..., \hat{y}^T$) and improve my belief about the true $y$ over time, considering that:

dimension $d$ is not small. e.g. $d 50$
data samples, $X$s, are not i.i.d. but all coming from a fixed unknown $y$.
classifier is not optimal (just trained on some available data) and at each round gives an estimate about the $\hat{y}^t$ for the current $X^t$.

I have been reading some materials and came across the following but I am not sure which one is better to investigate more into:

Sequential Hypothesis Testing
Optimal stopping
Sequential probability ratio test
HDI+ROPE decision rule: highest density interval (HDI) region of practical equivalence (ROPE)

Or is there any specific Bayesian framework for it?

Topic bayesian sequential-pattern-mining classification time-series

Category Data Science

Michael Hearn · Accepted Answer · 2019年11月15日 06:29

I think all of the options you give will yield results for the problem you described. It seems you have a y like a die that yields results which are rolls of the die all unpredictable but still linked from the y and you want to infer the y based on the x values. Like a Hidden Markov model.

The part about wanting to accurately make an estimate of y at each x and the x's being correlated make me believe that LSTM technology may be of benefit. If you want to use a NN.

Optimal stopping and Sequential Hypothesis Testing and Sequential probability ratio test and HDI+ROPE will all work for the abstract problem you describe. Until you give us more details about your problem like what specifically you will be working with it is hard to give you concise direction.

If you create a LSTM That takes in an X and makes a guess about which y and train it on x data you have then you would have the predictive model you seek.

How to update the posterior belief when we are observing a stream of correlated data from a fixed but unknown data source

About