How to update the posterior belief when we are observing a stream of correlated data from a fixed but unknown data source
I want to build a [probabilistic] model that aims to infer the true value of an unknown categorical variable, $y \in \{1,2,..., K\}$.
We have a dataset $(X,y): \mathbb{R}^d\rightarrow \{1,2,..., K\}$ and we can train a classifier that gives $d$-dimensional data, $X$, and estimates the output $y$.
Now, suppose that $X$s are correlated and all coming from a fixed $y$. I mean, we are observing $X^1, X^2,...., X^T,...$ over time and we know that $y$ is fixed for all of them.
For example:
- We receive $X^1$ (at time $t=1$) and our previously trained classifier produces a guess about $\hat{y}^1$.
- Then, we receive $X^2$, and we again use the classifier to guess $\hat{y}^2$.
- Then, we receive $X^3$, and so on.
So, at time $t=T$ we have $\hat{y}^1, \hat{y}^2, ..., \hat{y}^T$.
Now, the question is: How can I make a model to use these estimations ($\hat{y}^1, \hat{y}^2, ..., \hat{y}^T$) and improve my belief about the true $y$ over time, considering that:
dimension $d$ is not small. e.g. $d 50$
data samples, $X$s, are not i.i.d. but all coming from a fixed unknown $y$.
classifier is not optimal (just trained on some available data) and at each round gives an estimate about the $\hat{y}^t$ for the current $X^t$.
I have been reading some materials and came across the following but I am not sure which one is better to investigate more into:
- Sequential Hypothesis Testing
- Optimal stopping
- Sequential probability ratio test
- HDI+ROPE decision rule: highest density interval (HDI) region of practical equivalence (ROPE)
Or is there any specific Bayesian framework for it?
Topic bayesian sequential-pattern-mining classification time-series
Category Data Science