Can Expectation Maximization estimate truth and confusion matrix from multiple noisy sources?

Suppose we have $m$ sources, each of which noisily observe the same set of $n$ independent events from the outcome set $\{A,B,C\}$. Each source has a confusion matrix, for example for source $i$:

$$C_i = \begin{bmatrix} 0.98 0.01 0.07 \\ 0.01 0.97 0.00 \\0.01 0.02 0.93\end{bmatrix} $$

where each column relates to the truth, and each row relates to the observation. Eg. if the true event is $B$ then source $i$ will get it right 97% of the time, and observe $A$ 1% of the time and $C$ 2% of the time. We can assume the diagonal elements are > 95%

Given a sequence of $n$ events, where each event $j$ was observed by source $i$ as $O_{i,j}$, it is trivial to estimate the pmf of the truth $T_j$ by solving $P(T_j | O_{1,j},\dots,O_{m,j})$ using Bayesian formula (given some reasonable priors about the probabilities of the events themselves, say, uniform).

However suppose we didn't have the confusion matrices, nor the ground truth, and instead wanted to estimate them both. One algorithm is:

  • Start with some reasonable confusion matrix for each source $C_{i,0}$
  • Fixing the confusion matrices $C_{i,k}$, estimate the most likely truths $T_{j,k}$ using Bayes formula
  • Fixing truths $T_{j,k}$, estimate new confusion matrices $C_{i,k+1}$ based on how often each source got it "wrong" (allegedly)
  • Repeat the last two steps incrementing $k$ until convergence

This seems like the EM algorithm but I don't know how to show this formally. (No this is not homework.)

1) Is this EM, or some other standard algorithm in data fusion?

2) Does it have convergence guarantees?

3) Does it have any guarantees about the quality of the solution and how well the final confusion matrices will approximate the true confusion matrices?

4) Are there issues about the number of parameters being estimated vs. the number of samples? Eg. it seems there are $n + 6m$ parameters being estimated - the $n$ truths and the $6m$ elements across all confusion matrices (last cell of each column is determined by the others).

EDIT

These two papers describes exactly the problem and how to use EM to solve it:

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm http://crowdsourcing-class.org/readings/downloads/ml/EM.pdf

LEARNING FROM NOISY SINGLY-LABELED DATA https://openreview.net/pdf?id=H1sUHgb0Z

So the answers are:

1) Properly interpreted yes this is EM

2) EM in general converges to local optimum

3) Not really. In this problem though since the sources are 97%+ accurate, I expect the estimates to be pretty good

4) I don't think this is an issue - this is a "non-parametric" EM algorithm as the confusion matrices are not parameterized in anyway way. The sample sizes I deal with are in the 100000's+ so shouldn't be an issue

Topic parameter-estimation expectation-maximization confusion-matrix data-mining

Category Data Science


I don't have a full answer to your question, but I wanted to help (and would love to know a complete answer). In a simpler case when you have one source $m=i=1$, it seems to me you are describing a scenario that resembles a hidden Markov model which uses a EM algorithm-based solution (Baum–Welch algorithm) – that is if you make the further hypothesis that a Markov property is satisfied. Because convergence properties of the stationary distribution of Markov chains are exponentially fast the algorithm convergence may be limited to the implemented algorithm. I hope that info helps point you in the right direction.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.