Finding synergies among observations of equal length
Assume we have a set $I$ with 20 different items (we call them $I_0$, $I_1$ up to $I_{19}$). Also we have $n$ observations $O \in I^{n\times 8}$; so each observation is a subset of $I$ with exactly 8 items and is labeled with a score.
Just as an illustration here are some made up observations with their score:
- $O_1=\{I_0, I_8, I_9, I_{10}, I_{14}, I_{15}, I_{16}, I_{17}\};s_1=0.995$
- $O_2=\{I_0, I_1, I_2, I_3, I_4, I_5, I_6, I_7\};s_2=0.667$
- $O_3=\{I_2, I_3, I_9, I_{15}, I_{16}, I_{17}, I_{18}, I_{19}\};s_3=0.1$
The goal now is to identify those subsets (of any length $\leq 8$) that have the best synergies based on the observations; using an efficient algorithm.
Strictly speaking I would want the first $t$ subsets with length $l$ (for each $1 \leq l \leq 8$) of a complete list, ordered (descending) by the average score of observations containing the subset divided by the average score of observations without the subset.
Using brute force (over all subsets of $I$ of length $l$) would be easy to implement but since $8$ and $20$ are only placeholders for much larger numbers, the performance would not be acceptable. Any suggestions?
Topic efficiency
Category Data Science