Heuristics, methods to speed up searches over subsets of big set (combinatorially NP hard probably)

I have a reasonable-sized set of size N (say 10 000 objects) in which I am searching for groups of compatible elements. Meaning that I have a function y = f(x_1, x_2, x_3, ..., x_n) returning bool 0/1 answer whether n elements are compatible. We are interested in executing this search on each subset smaller than 8 elements of N. Which is obviously either NP hard or close to this. Even for pairwise search for n elements set we have n^2 pairs to explore, which is somehow doable yet still grows quite fast for big sets. Yet then it goes nasty when we are looking for triples (identifying 3-element cliques) and quadriple or greater magnitudes.

Would you suggest any machine learning approach to help with this task? Maybe at efficiently sampling the search-space and identifying which subsets to search on, maybe fast heauristic evaluations of the function to direct the searches.

Topic feature-reduction grid-search dimensionality-reduction machine-learning

Category Data Science


This is not ML but your setup reminds me of the Covering Array Problem, which is used for Software Testing.

If you consider your configuration arrays [0 0 1 0 1 0 1 0 1 .... ] as certain tests run , a typical question is what is the minimum number of tests I need to run while I can cover most different setups? Important to increase test completeness but also minimize time it takes to run them.

Possible some covering array solutions might help. There are even tables that are pre-computed that you can use for different number of N and coverage strengths https://math.nist.gov/coveringarrays/coveringarray.html

And as you pointed out this is a combinatorial problem that suffers from combinatorial explosion. You might want to divide/conquer (split the N into k manageable groups).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.