How to figure out what elements are missing from a set, based on other sets?

I would like to solve a problem where I have a set of sets of possible values, but some elements of some sets are corrupted/deleted, so I had to figure out what is the most probable candidate replacement for the corrupted value.

So there are a set of possible elements: E1, E2, E3 ... E6.

I have a set of sets of elements without corruption. The presence/absence of the respective potential elements is represented with binary numbers:

     E1 E2 E3 E4 E5 E5
S1   1  0  1  0  0  0
S2   0  0  0  0  1  0
S3   0  0  1  0  0  0
...
S100 1  0  0  1  1  1

And I also have a similar matrix with sets with corrupted element(s):

     E1 E2 E3 E4 E5 E5
A1   0  1  1  0  0  0
A2   0  0  0  0  0  1
A3   1  1  1  0  0  0
...
A100 1  1  0  1  0  1

We do know that one element is missing from each of the second sets, and we must give probabilities for all possible elements (E1 - E5) as being the deleted element.

I was considering market basket analysis/association learning, but as far as I see, this only gives hint for associations between single elements (if E1 is present, then E2 is probably also present), but I want to take into consideration each items present to predict the missing one.

I was also thinking about using classical machine learning (as I have a training set in the form of the first set), but I'm struggle with the proper data representation for modeling (independent features are the artificially corrupted binary rows, and target is multilabel classification?). And anyways, dimensionality is high, while number of training instances is relatively small, so this may not be a good approach either.

And again, imputation methods for missing data are also considered, but the matrices are quite sparse, and I assume that classical imputation methods are quite unreliable with many NAs.

What is the good way?

Topic market-basket-analysis missing-data multilabel-classification association-rules

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.