Finding synergies among observations of equal length

Question

Finding synergies among observations of equal length

Matmarbon

2019年12月12日 13:07

Assume we have a set $I$ with 20 different items (we call them $I_0$, $I_1$ up to $I_{19}$). Also we have $n$ observations $O \in I^{n\times 8}$; so each observation is a subset of $I$ with exactly 8 items and is labeled with a score.

Just as an illustration here are some made up observations with their score:

$O_1=\{I_0, I_8, I_9, I_{10}, I_{14}, I_{15}, I_{16}, I_{17}\};s_1=0.995$
$O_2=\{I_0, I_1, I_2, I_3, I_4, I_5, I_6, I_7\};s_2=0.667$
$O_3=\{I_2, I_3, I_9, I_{15}, I_{16}, I_{17}, I_{18}, I_{19}\};s_3=0.1$

The goal now is to identify those subsets (of any length $\leq 8$) that have the best synergies based on the observations; using an efficient algorithm.

Strictly speaking I would want the first $t$ subsets with length $l$ (for each $1 \leq l \leq 8$) of a complete list, ordered (descending) by the average score of observations containing the subset divided by the average score of observations without the subset.

Using brute force (over all subsets of $I$ of length $l$) would be easy to implement but since $8$ and $20$ are only placeholders for much larger numbers, the performance would not be acceptable. Any suggestions?

Topic efficiency

Category Data Science

Finding synergies among observations of equal length

About