Parallel active optimization

I'm trying to optimize an expensive function for which I can choose sample points. The difficulty is that many function evaluations may be computed in parallel, taking varying amounts of time. I don't know which keywords to search for to find existing literature(/implementations).

So at a time, I might have already computed function values at 18 points, with 15 still being computed, and I want to start evaluating the function another point. Without the running jobs, I could make a model and find which next point might provide the most information. But now, I need to somehow tell the model that there are also 15 points for which I don't have values, but near which I don't want to evaluate the function.

Specifically, I'm looking for something with minimal assumptions and no gradient information. But I'll be happy if I just know the keyword(s) to search for. (I can come up with several hacks to that kind of work, but I wonder if there is any 'real' solution).


For illustration, say every column is a function evaluation point and every row is a timestep in the below ascii-figure. An X means the evaluation is still being computed and a * in the bottom row means it's done. A new evaluation starts as soon as an old one terminates, with others still busy.

XXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXX
 XXXXXXXXXXX XXXXX
 XXXXXXXXXXX XXXXX
 XXXXXXXXXXX XXXXX
 XXXXXXXXXXX X  XXX X
 XXXX XXXXXX X  XXX XX
 XXXX XXXXXX X  XXX  XX
 XXXX XXXXXX X  XXX  XX
 XXXX XXXXXX X  XXX  XX
 XXXX XXXXXX X  XXX  XX
 XXXX X XXXX X  XXX  XXX
 XXXX X XXXX X  X X  XXXX
 XXXX X XXXX X  X X  XXXX
  XXX X XXXX X  X X  XXXXX
  XXX X XXXX X  X X  XXXXX
  XXX X XXXX X  X X  XXXXX
  XXX X XXXX X  X X  XXXXX
  XXX X XX X X  X X  XXXXXX
  XXX   XX X X  X X  XXXXXXX
  XXX   XX X X  X X  XXXXXXX
  XXX   XX X X  X X  XXXXXXX
  XXX   XX   X  X    X XXXXXXXX
  XXX   XX   X  X    X XXXXXXXX
  XXX   XX   X  X    X XXXXXX XX
  XXX   XX   X  X    X XXXXXX XX
  XXX   XX   X       X XXXXXX XXX
  XX    XX   X       X XXXXXX XXXX
**  ****  *** ******* *      *

Each time an evaluation completes, how to choose the next point to evaluate, given several complete and several incomplete evaluations?

Topic sampling optimization parallel

Category Data Science


SMAC has a parallel version available.

I'm not sure it handles the single runs as you want, but it sounds strange to me if it don't. Give a try!


I would use a parallelized Gaussian process regression model. The next points to query in this framework are given by the posterior extrema or those of maximum expected improvement. Here's an implementation.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.