Understanding genetic algorithms

What is a genetic algorithm, and what are its practical advantages over other algorithms? Is it similar to any commonly used machine learning algorithm like linear/logistic regression, neural networks, or tree-based methods like gradient boosting and random forests? I've heard it's based on "mutated" combinations of other models. Does this make it more like an ensemble?

Topic genetic-algorithms machine-learning

Category Data Science


The genetic algorithm are one the most popular evolutionary computing methods that invented by John Holland, in the University of Michigan, in 1960 but won't become popular until the 90's. Their main purpose is to be used to solve problems where deterministic algorithms are too costly.

Genetic algorithm fall under metaheuristics that are high level search strategy which are problem independent and can apply to wide range of problems. These algorithm are far different from machine learning methods and mostly use to solve optimization problem in science and engineering. However, application of genetic algorithm will not limit to optimizations.

Incorporation of machine learning on genetic algorithm are common practice aimed to improve the GA performance.

On realm of machine learning, when ever you seek to compute an optimal value in polynomial time, always and always genetic algorithm is a viable choices due to its agile and robust search capabilities. Hence, genetic algorithm can use as computational tools in machine learning.

Please let me know if my answer is clear enough.


A genetic algorithm is an algorithm based on biological evolution, on how nature evolved. It does exactly that, evolve the algorithm so that "it" finds the best solution to the problem at hand. You can use a genetic algorithm to find a solution to a problem which you don't know the answer, you know the answer but want to know a different one, or you are just lazy. The common steps for a genetic algorithm are:

  1. Generate a random population of elements
  2. Evaluate fitness of each element (how good are they against the solution)
  3. Take the best elements for the next generation
  4. Generate child elements using the above elements
  5. Mutate (randomly) each child. This can also be done before step 4 with parents.
  6. Repeat from step 2 for n number of generations until the solution is found, the maximum number of generations have been reached or the fitness is not changing anymore (local minima).

Two basic problems with GAs:

  • To get trapped in a local minima (or local maximum depending on the point of view). This means that you might find a good answer (or not even a good one) but it will never reach the best answer because fitness values are not changing or not getting better. This is fought using Mutation in step 5 to keep diversity in the population so it doesn't get stuck.
  • They are generally slower than other approaches.

If you can, take a look at this book. It is not free, but worth a look. Alternatively, take a look at this online book for free, it is a great source and he has his own youtube channel. It's more basic than the other book I recommend, but will help you get started with GAs.

To answer the other part of your question, the GA should be used to find a model, but will not act as a model. For example, if you have a neural network you can train it using the backpropagation method, but you could also train it using a GA. The GA will not use anything of the backpropagation mathematics, it could be used to generate weights in its neurons, and evaluate the answer (last layer). Weights will evolve to get closer to the solution. In this scenario, the model will still be the NN, but you used a different algorithm to find the best NN.

Hope this helps.


For the machine learning algorithm you mentioned, regression and neural networks are formulated in optimization framework, and tree-based method is based on information gain.

Genetic algorithm (GA) is a local search method. Given a value in the solution space, it will mutate to create several candidates. A criteria will be used to evaluate each candidates and only the few best ones will be kept for further mutation. It could usually be proven that GA could converge to the best solution if there is enough iteration. However, it is unfeasible practically. More discussion could be found here

As for ensemble learning, it refers to (weighted) voting of multiple models. There are two normal ways, bagging or boosting.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.