Can Expectation Maximization estimate truth and confusion matrix from multiple noisy sources?
Suppose we have $m$ sources, each of which noisily observe the same set of $n$ independent events from the outcome set $\{A,B,C\}$. Each source has a confusion matrix, for example for source $i$:
$$C_i = \begin{bmatrix} 0.98 0.01 0.07 \\ 0.01 0.97 0.00 \\0.01 0.02 0.93\end{bmatrix} $$
where each column relates to the truth, and each row relates to the observation. Eg. if the true event is $B$ then source $i$ will get it right 97% of the time, and observe $A$ 1% of the time and $C$ 2% of the time. We can assume the diagonal elements are > 95%
Given a sequence of $n$ events, where each event $j$ was observed by source $i$ as $O_{i,j}$, it is trivial to estimate the pmf of the truth $T_j$ by solving $P(T_j | O_{1,j},\dots,O_{m,j})$ using Bayesian formula (given some reasonable priors about the probabilities of the events themselves, say, uniform).
However suppose we didn't have the confusion matrices, nor the ground truth, and instead wanted to estimate them both. One algorithm is:
- Start with some reasonable confusion matrix for each source $C_{i,0}$
- Fixing the confusion matrices $C_{i,k}$, estimate the most likely truths $T_{j,k}$ using Bayes formula
- Fixing truths $T_{j,k}$, estimate new confusion matrices $C_{i,k+1}$ based on how often each source got it "wrong" (allegedly)
- Repeat the last two steps incrementing $k$ until convergence
This seems like the EM algorithm but I don't know how to show this formally. (No this is not homework.)
1) Is this EM, or some other standard algorithm in data fusion?
2) Does it have convergence guarantees?
3) Does it have any guarantees about the quality of the solution and how well the final confusion matrices will approximate the true confusion matrices?
4) Are there issues about the number of parameters being estimated vs. the number of samples? Eg. it seems there are $n + 6m$ parameters being estimated - the $n$ truths and the $6m$ elements across all confusion matrices (last cell of each column is determined by the others).
EDIT
These two papers describes exactly the problem and how to use EM to solve it:
Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm http://crowdsourcing-class.org/readings/downloads/ml/EM.pdf
LEARNING FROM NOISY SINGLY-LABELED DATA https://openreview.net/pdf?id=H1sUHgb0Z
So the answers are:
1) Properly interpreted yes this is EM
2) EM in general converges to local optimum
3) Not really. In this problem though since the sources are 97%+ accurate, I expect the estimates to be pretty good
4) I don't think this is an issue - this is a "non-parametric" EM algorithm as the confusion matrices are not parameterized in anyway way. The sample sizes I deal with are in the 100000's+ so shouldn't be an issue