Skip to main content

Module 00: Introduction to Evolutionary Computation

In 2006, NASA needed an antenna for its Space Technology 5 mission. The requirements were brutal: it had to work across multiple frequencies, fit on a tiny satellite, and survive the vibrations of a rocket launch. Human engineers had spent months trying to design one. Nothing worked well enough.

So they did something unusual: they evolved one.1

They started with a population of random antenna shapes - just bent wires, described as sequences of construction commands. They evaluated each antenna in a physics simulator. The best ones survived. The worst ones died. The survivors were mutated - a wire segment lengthened here, an angle changed there - and crossed with each other to produce offspring. After thousands of generations, what emerged looked like nothing any human would design: a bizarre, asymmetric, zigzagging structure that looked like a crumpled paperclip. But it met every specification, outperformed the human designs, and actually flew in space.

No engineer designed that antenna. Evolution did.

And then there's Adrian Thompson's famous FPGA experiment.2 In 1996, Thompson used a genetic algorithm to evolve a circuit on a field-programmable gate array. The goal: distinguish between two audio tones. The evolved circuit used only 37 logic gates - far fewer than any human design. But the weird part was that some gates weren't even connected to the input or output. When Thompson removed them, the circuit stopped working. The GA had discovered and exploited electromagnetic coupling between components - a physical phenomenon that human engineers deliberately avoid. Evolution found a solution humans couldn't even understand, let alone design.

That's what this course is about: Evolutionary Computation (EC) - a family of algorithms that solve problems the way nature does, not through calculus and gradients, but through variation, selection, and the relentless pressure of "whatever works, works."

What is Evolutionary Computation?

Here's the good news: if you understand the idea "try a bunch of things, keep the good ones, make variations of the good ones, repeat" - you already understand the core of evolutionary computation. That's really all it is.

Let's make this concrete. Imagine you're trying to design the perfect paper airplane. You could study aerodynamics for years, compute optimal wing angles, solve differential equations for airflow... or you could do what every kid does: fold 20 random airplanes, throw them all, keep the designs that flew farthest, tweak those designs a bit, and repeat. After a few rounds, you'll have a pretty good airplane. That's evolution.

Every EC algorithm follows this same loop:

1. Create a population of candidate solutions
2. Evaluate each candidate (compute fitness)
3. Select the best candidates
4. Create new candidates by varying the selected ones (mutation, crossover)
5. Repeat from step 2

Five lines. That's it. These five lines describe dozens of algorithms. What changes between them is how you represent solutions, how you select, how you vary, and how you adapt the search over time. But the core idea - try stuff, keep what works, make variations - never changes.

Now, you might be thinking "that sounds inefficient." And compared to gradient descent on a smooth, differentiable function, it is. But here's the thing: most real-world problems aren't smooth and differentiable. They're noisy, discontinuous, combinatorial, deceptive, or just plain black boxes where you can evaluate solutions but can't peek inside to compute a gradient. That's where EC goes from "cute idea" to "only game in town."

The major EC families include:

  • Genetic Algorithms (GA) - Holland's invention.3 Binary strings, crossover, mutation. The most widely known EC method. Great for combinatorial problems. Think of it as evolution with sexual reproduction - solutions combine their traits. Modules 03-04.
  • Evolution Strategies (ES) - Rechenberg and Schwefel's contribution.4 Real-valued vectors, Gaussian mutation, step-size adaptation. CMA-ES is the crown jewel - it learns the shape of the search landscape. Modules 05-08.
  • Genetic Programming (GP) - Koza's idea.5 Instead of evolving a list of numbers, evolve a computer program. Trees of mathematical operations that GP assembles, recombines, and mutates until they solve the problem. Module 09.
  • Differential Evolution (DE) - Storn and Price.6 Creates new candidates by adding the difference between two existing solutions to a third. Embarrassingly simple, embarrassingly effective. Module 10.
  • Estimation of Distribution Algorithms (EDA) - Instead of crossing over individual solutions, build a statistical model of what good solutions look like and sample from it. CMA-ES is secretly one of these. Module 11.
  • Particle Swarm Optimization (PSO) - Kennedy and Eberhart.7 Inspired by bird flocking, not genetics. Each particle remembers its own best position and knows the swarm's best. Technically swarm intelligence, but always taught alongside EC. Module 13.
  • Ant Colony Optimization (ACO) - Dorigo.8 Virtual ants lay pheromone trails on good paths. Other ants follow the pheromone. Over time, the colony converges on the shortest route. Beautiful for graph problems. Module 14.
  • Neuroevolution - Evolving neural networks. NEAT grows network topology from scratch, adding neurons and connections as needed.9 No backpropagation required. Modules 19-21.
  • Multi-Objective EA - Real problems have trade-offs: cost vs quality, speed vs accuracy. NSGA-II and friends find the entire Pareto front of non-dominated solutions.10 Modules 16-17.
  • Quality-Diversity - MAP-Elites doesn't just find one good solution - it fills an archive of diverse, high-performing solutions.11 Imagine a robot with 100 different walking gaits, ready to switch when a leg breaks. Module 18.

Don't worry if these names don't mean much yet. By the time you finish Module 08, you'll have CMA-ES in your bones, and the rest will click into place as we cover each family.

EC in the Computational Intelligence Landscape

Evolutionary Computation doesn't exist in a vacuum. It's one pillar of a broader field called Computational Intelligence (CI) - think of CI as the umbrella that covers all nature-inspired and bio-inspired approaches to solving complex problems.12

Here's a way to think about it: if AI is the whole universe, then CI is the galaxy of approaches inspired by nature and biology. Within that galaxy, EC is a solar system - and each algorithm family (GA, ES, GP, DE, etc.) is a planet.

Explore the landscape below - click on "Evolutionary Computation" to zoom into all the families we'll cover in this course:

Loading chart...

The key CI paradigms are:

  • Neural Networks - From perceptrons to Transformers. The dominant paradigm in modern AI, powered by gradient descent and backpropagation. The workhorse of pattern recognition and generation.
  • Fuzzy Systems - Reasoning with degrees of truth rather than binary yes/no. "The temperature is somewhat hot" rather than "hot or not." Widely used in control systems - your washing machine probably uses fuzzy logic.
  • Evolutionary Computation - This course. Population-based search inspired by biological evolution. Doesn't need gradients. Thrives on noisy, black-box, multimodal problems.
  • Swarm Intelligence - Collective behavior of decentralized agents. PSO mimics bird flocking, ACO mimics ant pheromone trails, ABC mimics bee foraging. Closely related to EC and covered in Modules 13-15.
  • Artificial Immune Systems - Algorithms inspired by how your body fights infections. Anomaly detection, pattern recognition, negative selection.
  • Bayesian Networks - Probabilistic graphical models for reasoning under uncertainty. "Given these symptoms, what's the probability of this disease?"
  • Probabilistic Methods - Stochastic sampling, Monte Carlo methods, statistical learning.

Here's the exciting part: these paradigms complement each other. Neural networks are brilliant at pattern recognition but need smooth gradients. EC is brilliant at optimization without gradients. Neuroevolution (Modules 19-21) combines them: use evolution to design and train neural networks. That's not a workaround - it's how two pillars of CI synergize to solve problems neither could handle alone.

The EC Family Tree

Ready to see the big picture? Here's the full family tree of EC algorithms you'll learn in this course. Click any branch to zoom in - each leaf links to its corresponding module:

Loading chart...

Don't try to memorize this now. By the end of the course, you'll have worked with every branch on this tree, and the relationships between them will feel natural.

A Brief History of Evolving Solutions

The idea of using evolution as a computational tool is older than you might think. Here's the story, and it's a good one:

1960s - Three Independent Inventions: In one of those beautiful coincidences that happen in science, three groups on three continents independently invented evolutionary computation. Ingo Rechenberg and Hans-Paul Schwefel in Berlin developed Evolution Strategies to optimize the shapes of jet nozzles and bent pipes in actual wind tunnel experiments - literal hardware optimization where you couldn't compute a gradient if you wanted to.4 Meanwhile, John Holland at the University of Michigan developed Genetic Algorithms, approaching it from the theoretical side - how do adaptive systems work?3 And Lawrence Fogel in California developed Evolutionary Programming, evolving finite-state machines. None of them knew about the others at first.

Think about that for a moment. The idea of computational evolution was so natural, so obviously useful, that three separate groups discovered it independently. That's a good sign that we're onto something fundamental.

1973 - The 1/5th Rule: Rechenberg published his foundational work and proved something beautiful: if more than 1/5 of mutations are successful, increase the mutation step size; if fewer, decrease it. This was the first self-tuning algorithm - it adjusted itself as it ran. Simple, elegant, and still useful today. You'll implement this in Module 05.4

1975 - Holland's Schema Theorem: Holland published Adaptation in Natural and Artificial Systems, establishing the theoretical foundations of GAs. The schema theorem explained (roughly) why GAs work: they implicitly process building blocks of solutions, combining good partial solutions into complete ones.3

1989 - Programs That Write Programs: John Koza demonstrated Genetic Programming - evolving tree-structured programs to solve problems automatically.5 Not evolving parameters for a fixed program. Evolving the program itself. The possibilities this opened up - symbolic regression, automatic algorithm design, neural architecture search - are still being explored today.

1991-1995 - A Cambrian Explosion: Multiple new paradigms appeared in rapid succession. Marco Dorigo proposed Ant Colony Optimization (1991),8 turning ant behavior into an algorithm for routing problems. Kennedy and Eberhart invented Particle Swarm Optimization (1995),7 capturing how bird flocks converge on food sources. Storn and Price introduced Differential Evolution (1995),6 proving that vector arithmetic could be a surprisingly powerful mutation operator.

1996 - CMA-ES Changes Everything: Hansen and Ostermeier invented CMA-ES.13 This was the breakthrough that made evolution strategies serious competitors to classical optimization. CMA-ES doesn't just adapt the step size - it learns the full covariance structure of the fitness landscape. If your function has a narrow, rotated valley, CMA-ES discovers the rotation and stretches its search distribution to match. No human tells it to do that. It just... learns. This is Module 07, and it's where things get really good.

2000 - NSGA-II: Kalyanmoy Deb published NSGA-II,10 which became the gold standard for multi-objective optimization. If you've ever seen a Pareto front in an optimization paper, it was probably generated by NSGA-II or one of its descendants.

2002 - NEAT: Kenneth Stanley published NEAT9 - evolving both the topology and weights of neural networks, starting from the simplest possible network and growing complexity only when needed. Not "here's a big network, find good weights." More like "here's nothing - figure out what shape intelligence should be."

2015 - MAP-Elites: Mouret and Clune published MAP-Elites,11 launching the Quality-Diversity revolution. Instead of converging to a single optimum, illuminate the entire space of possible solutions. The most famous application: a robot that instantly adapts when it loses a leg, because it has a repertoire of hundreds of pre-evolved gaits to draw from.

2017 - The OpenAI Bombshell: Salimans et al. at OpenAI showed that a simple ES algorithm could train deep neural network policies for robotics, matching state-of-the-art reinforcement learning - and doing it 10x faster in wall-clock time because ES parallelizes trivially.14 The deep learning community, which had spent years refining backpropagation, was genuinely surprised.

2020s - EC Everywhere: EC methods now appear in LLM fine-tuning, prompt optimization, neural architecture search, drug discovery, and materials science.15 The field has come a long way from Rechenberg's bent pipes in the wind tunnel.

Where EC Shines

Let's get practical. Here's a quick guide to what EC family you'd reach for in different scenarios:

  • Genetic Algorithms - Airline scheduling, warehouse logistics, the Traveling Salesman Problem. Anything with combinatorial structure where solutions are sequences, permutations, or binary choices.
  • CMA-ES - Black-box continuous optimization. Hyperparameter tuning for ML models, engineering design, robotics control. It dominated the BBOB benchmark competition for over a decade.13
  • Genetic Programming - Discovering formulas from data (symbolic regression). Automatically generating trading strategies. Evolving novel algorithms. Anywhere the answer is a program, not a number.
  • Differential Evolution - Power systems, antenna arrays, chemical process parameters. Continuous functions with multiple local optima where CMA-ES's assumptions don't hold.
  • PSO - Neural network training when gradients are noisy, antenna pattern optimization, PID controller tuning. Quick to implement, competitive on many problems.
  • ACO - Vehicle routing, job scheduling, network design. If your problem lives on a graph, ACO is a natural fit.
  • NEAT - Game AI, robot controllers, music composition. Anywhere you want to evolve the structure of a neural network, not just its weights.
  • NSGA-II - Car body design at BMW,16 bridge engineering, portfolio optimization. Any problem with multiple conflicting objectives.
  • MAP-Elites - Robot damage recovery, game level design, creative design exploration. When you want not one answer but a repertoire of diverse, high-quality answers.
When to use EC over gradient descent

Here's the honest truth: if you have a smooth, differentiable loss function and clean gradients, use gradient descent. It will be faster. Period.

But use EC when any of these are true:

  • The objective is non-differentiable (combinatorial problems, simulations with if-statements)
  • The objective is noisy (real-world measurements, stochastic simulations)
  • The objective is a black box (you can evaluate it but can't see inside it)
  • You want diverse solutions, not just one optimum
  • You need to optimize over discrete structures (network topologies, program syntax, schedules)
  • The landscape has many local optima that trap gradient methods

And here's the thing most textbooks won't tell you: in practice, a huge number of real-world optimization problems fall into these categories. The smooth, differentiable world of textbook optimization is the exception, not the rule.


Setting Up Your Evolutionary Lab

1. Prerequisites

Make sure you have:

  • Python 3.13+
  • pip or uv (package manager)
  • A virtual environment
tip

Use uv if you haven't tried it yet - it's the fastest Python package manager by a wide margin, and it handles virtual environments beautifully.

2. Install the core libraries

pip install cma nevergrad deap pyswarms numpy matplotlib scipy
LibraryPurpose
cmaReference CMA-ES implementation by Nikolaus Hansen (the inventor)
nevergradMeta's gradient-free optimization platform - one interface for dozens of algorithms
deapDistributed Evolutionary Algorithms in Python - flexible GA/GP/ES toolkit
pyswarmsParticle swarm optimization
numpyNumerical computing
matplotlibPlotting
scipyScientific computing (includes differential_evolution)

Confirm it works:

python -c "import cma; print('CMA-ES version:', cma.__version__)"
python -c "import nevergrad; print('Nevergrad OK')"
python -c "import deap; print('DEAP OK')"
python -c "import pyswarms; print('PySwarms OK')"

Your First Optimizations

Time to get your hands dirty. We're going to solve the Sphere function - the "Hello World" of optimization - with three different EC families. This is the simplest possible test case:

f({x})={i=1}{n}xi2f(\mathbf\{x\}) = \sum_\{i=1\}^\{n\} x_i^2

The minimum is at the origin (all zeros). If an optimizer can't find this, something is very broken. But it's a great sanity check to make sure everything works.

CMA-ES (Evolution Strategies)

Let's start with the big gun. CMA-ES is overkill for Sphere, but it shows you the interface:

import cma
import numpy as np

# Start from a random point, initial step-size = 2.0
es = cma.CMAEvolutionStrategy(np.random.randn(10) * 5, 2.0, \{"verbose": -1\})
es.optimize(lambda x: sum(x**2))

result = es.result
print(f"Best fitness: \{result.fbest:.2e\}")
print(f"Best solution: \{np.round(result.xbest, 6)\}")
print(f"Function evaluations: \{result.evals\}")
Expected Output
Best fitness: 3.97e-14
Best solution: [ 0. -0. -0. 0. -0. -0. 0. -0. 0. -0.]
Function evaluations: 2260

CMA-ES finds a solution within 10^-14 of the true optimum in ~2,260 evaluations. Every component is essentially zero. Not bad for a 10-dimensional search with no gradient information.

Here's what CMA-ES convergence looks like - notice the log scale. On Sphere it converges fast; Rosenbrock's narrow valley takes longer:

Loading chart...

GA with DEAP (Genetic Algorithms)

Now the same problem with a genetic algorithm. Notice the different philosophy - DEAP has you assemble the pieces yourself:

import random
from deap import base, creator, tools, algorithms

# Define what "fitness" means and what an "individual" looks like
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessMin)

# Register the building blocks
toolbox = base.Toolbox()
toolbox.register("attr_float", random.gauss, 0, 5)
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_float, n=10)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", lambda ind: (sum(x**2 for x in ind),))
toolbox.register("mate", tools.cxBlend, alpha=0.5)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=1, indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)

# Run the GA
pop = toolbox.population(n=100)
result, log = algorithms.eaSimple(pop, toolbox, cxpb=0.7, mutpb=0.2, ngen=200, verbose=False)
best = tools.selBest(result, k=1)[0]
print(f"GA best: \{best.fitness.values[0]:.2e\}")
Expected Output
GA best: 1.83e-04

The GA finds a good solution but not as precise as CMA-ES on this smooth function. That's expected - GAs are generalists that shine more on combinatorial and multimodal problems, while CMA-ES is a specialist for continuous optimization.

Nevergrad (Algorithm Comparison)

Here's the magic of Nevergrad - compare many algorithms through one clean interface:

import nevergrad as ng
import numpy as np


def sphere(x: np.ndarray) -> float:
return float(np.sum(np.square(x)))


parametrization = ng.p.Array(shape=(10,)).set_bounds(-10, 10)

for algo_name in ["CMA", "PSO", "DE", "OnePlusOne", "RandomSearch"]:
optimizer = ng.optimizers.registry[algo_name](
parametrization=parametrization, budget=2000,
)
recommendation = optimizer.minimize(sphere)
print(f"\{algo_name:15s\} -> best: \{sphere(recommendation.value):.2e\}")
Expected Output
CMA             -> best: 3.97e-14
PSO -> best: 8.12e-03
DE -> best: 2.45e-06
OnePlusOne -> best: 3.12e-01
RandomSearch -> best: 8.71e+00

CMA-ES wins by 12 orders of magnitude over random search. DE is a strong second. PSO is respectable. (1+1)-ES is decent but limited by having only one parent. RandomSearch is... random. But notice: even random search gets in the right neighborhood! Sphere is forgiving.

Here's the comparison visualized - notice the log scale on the y-axis:

Loading chart...

Watching Evolution in Action

Now for the most satisfying part of EC: watching a population converge. Below is CMA-ES evolving on the 2D Rosenbrock function. The contour lines show the fitness landscape - a narrow, curved valley with the optimum at (1, 1). Watch how the population cloud doesn't just move toward the optimum - it reshapes itself to match the valley:

Loading chart...

See how the blue dots (Gen 0) scatter widely, then the cloud narrows and stretches along the valley? By Gen 60, the green cluster has collapsed onto the optimum. CMA-ES didn't just find the answer - it learned the shape of the landscape and adapted its search distribution to match. No human told it the valley was narrow and curved. It figured that out on its own. That's the power of covariance matrix adaptation.


Fitness Landscapes: The Terrain of Optimization

Understanding fitness landscapes is key to understanding why some problems are easy and others are nightmarish. Let's visualize a few. Zoom and pan on this interactive Rastrigin landscape - the red star marks the global optimum at (0, 0):

Loading chart...

Think of these landscapes like actual terrain you're hiking:

Sphere is a smooth bowl - like standing in a valley where you can always see the lowest point. Walk downhill and you'll get there. Any optimizer handles this. Boring, but essential for testing.

Rastrigin is an egg carton. One global minimum surrounded by hundreds of local traps. Imagine hiking in a landscape where every direction has hills and valleys, and the one true lowest point looks just like all the local dips. Hill climbing gets stuck immediately. But a population of hikers spread across the landscape? Some of them will be near the true bottom, and selection will amplify their signal.

Rosenbrock is a narrow canyon with a nearly flat floor. You can find the canyon easily, but walking along it to the actual lowest point is agonizingly slow because the gradient along the valley floor is tiny. This is where CMA-ES shines - it learns the canyon's shape and stretches its search distribution to match.

These three are the "Hello World" trio of black-box optimization. You'll see them again and again throughout this course, and in Module 25, we'll benchmark against the full BBOB suite of 24 test functions.13


EC vs. Gradient Descent: A Head-to-Head

Let's see the practical difference. We'll compare CMA-ES to gradient descent on a noisy Sphere function - the kind of problem you'd encounter with a real-world simulator:

import numpy as np
import cma


def noisy_sphere(x: np.ndarray) -> float:
"""Sphere function + Gaussian noise. Gradients would be unreliable."""
return float(np.sum(x**2) + np.random.randn() * 2.0)


# CMA-ES: doesn't care about noise
es = cma.CMAEvolutionStrategy(np.random.randn(5) * 5, 2.0, \{"verbose": -1\})
es.optimize(noisy_sphere, iterations=200)
print(f"CMA-ES on noisy sphere: best = \{es.result.fbest:.4f\}")

# Gradient descent: suffers from noise
x = np.random.randn(5) * 5
lr = 0.01
for i in range(2000):
grad = 2 * x + np.random.randn(5) * 2.0 # noisy gradient
x = x - lr * grad

print(f"Noisy GD on noisy sphere: final = \{np.sum(x**2):.4f\}")
Expected Output
CMA-ES on noisy sphere: best = 0.1247
Noisy GD on noisy sphere: final = 3.8412
Loading chart...

CMA-ES (teal) converges smoothly on a log scale while noisy GD (red) oscillates wildly and plateaus orders of magnitude higher.

Why? CMA-ES is a population method - it evaluates many points per generation and uses their ranking (not absolute values) to update the search distribution. Noise in individual evaluations washes out across the population. Gradient descent, on the other hand, is a single-point method that relies on the gradient being accurate. Add noise and it starts chasing ghosts.

This is one of the key practical advantages of EC: robustness to noise. And noisy evaluations are the norm in the real world - physical experiments, stochastic simulations, human preferences.


Common Issues & Fixes

ProblemSolution
ModuleNotFoundError: No module named 'cma'pip install cma
CMA-ES runs foreverCheck initial sigma - if it's too small, progress is glacial
CMA-ES converges to wrong pointStarting point in a local basin; try larger sigma or use IPOP restarts
Nevergrad budget too lowMost algorithms need at least 100 * dimension evaluations
deap import errorspip install deap (note: it's deap, not DEAP)
GA not convergingSelection pressure too low or mutation rate too high - balance exploration and exploitation
Matplotlib not showing plotsUse plt.savefig("plot.png") if running headless

What's Coming Next

Now that your evolutionary lab is ready, here's the journey ahead:

  • Modules 01-02: Optimization foundations and the shared building blocks (representations, operators, selection) that all EC methods use. This is the vocabulary you need before diving into any specific family.
  • Modules 03-04: Genetic Algorithms - the most widely known EC family. Binary encoding, crossover, schema theorem, real-coded GAs, island models.
  • Modules 05-08: Evolution Strategies - from the simplest (1+1)-ES to CMA-ES and natural evolution strategies. The deepest part of the course.
  • Modules 09-12: Genetic Programming, Differential Evolution, EDAs, and coevolution - each a unique branch of the EC tree with distinct strengths.
  • Modules 13-15: Swarm Intelligence - PSO, ACO, and a critical look at the zoo of nature-inspired methods.
  • Modules 16-18: Multi-objective optimization and quality-diversity - when one objective isn't enough.
  • Modules 19-21: Neuroevolution - evolving neural networks and using ES as an alternative to gradient-based RL.
  • Modules 22-26: Advanced topics (surrogates, constraints, hyper-heuristics), rigorous benchmarking, theory, and your capstone project.

The best companions to this course are Eiben & Smith's Introduction to Evolutionary Computing17 for the broad picture, Holland's Adaptation in Natural and Artificial Systems3 for the GA foundations, and Hansen's CMA-ES tutorial13 for the deep ES side.


Pro Tips from the Trenches

  • Match the algorithm to the problem. GA for combinatorial, CMA-ES for continuous black-box, DE for multimodal continuous, ACO for graph problems, GP for symbolic regression. The No Free Lunch theorem18 says no algorithm dominates everywhere - understanding each family's strengths is the real skill this course teaches.
  • Population size matters. Too small and you lose diversity (premature convergence); too large and you waste evaluations. Most algorithms have sensible defaults - trust them initially.
  • Always plot the convergence curve. It tells you more than the final answer. Is the algorithm still improving? Did it plateau early? Did it diverge? The shape of the curve is diagnostic.
  • Normalize your search space. If some variables range 0-1 and others 0-10,000, rescale them. EC algorithms treat all dimensions equally - give them dimensions that deserve equal treatment.
  • Run multiple trials and report statistics. EC is stochastic. A single run means nothing. Report means and standard deviations. Module 25 will make you rigorous about this.
  • Read the original papers. Holland, Rechenberg, Hansen, Dorigo, Kennedy, Koza, Deb, Stanley - they wrote clearly and their insights are timeless.

Milestone Checklist

  • Installed cma, nevergrad, deap, pyswarms and confirmed they work
  • Ran CMA-ES on Sphere and found the minimum
  • Ran a GA on Sphere using DEAP
  • Compared multiple algorithms using Nevergrad
  • Understand the evaluate-select-vary loop shared by all EC methods
  • Can name the major EC families and when each is appropriate
  • Understand where EC sits in the Computational Intelligence landscape

Exercises

  1. Algorithm shootout: Use Nevergrad to compare CMA, PSO, DE, OnePlusOne, and RandomSearch on the Rastrigin function in 10 dimensions with a budget of 5,000 evaluations. Which wins? Run it 10 times - do the rankings change?

  2. GA on OneMax: Implement a simple binary GA using DEAP to solve the OneMax problem (maximize the number of 1s in a binary string of length 100). Plot the best fitness per generation. How many generations does it take to find the all-ones string?

  3. Dimension scaling: Run CMA-ES on Sphere in 2, 5, 10, 20, and 50 dimensions. Plot evaluations vs dimension. What's the relationship?

  4. Noise robustness: Add Gaussian noise to Sphere: f(x) = sum(x**2) + sigma * randn(). Try sigma = 0, 0.1, 1.0, 10.0. How does CMA-ES handle it? How about DE (use scipy.optimize.differential_evolution)?

  5. EC family identification: For each problem below, identify which EC family would be most appropriate and explain why:

    • Scheduling 50 jobs across 10 machines to minimize total completion time
    • Finding a mathematical formula that fits a dataset of (x, y) pairs
    • Optimizing 20 continuous parameters of a noisy physics simulator
    • Designing a neural network architecture for image classification
    • Finding the shortest route visiting 100 cities
Exercise 1 Solution
import nevergrad as ng
import numpy as np
from collections import defaultdict


def rastrigin(x: np.ndarray) -> float:
n = len(x)
return float(10 * n + np.sum(x**2 - 10 * np.cos(2 * np.pi * x)))


algos = ["CMA", "PSO", "DE", "OnePlusOne", "RandomSearch"]
results = defaultdict(list)

for trial in range(10):
for algo in algos:
param = ng.p.Array(shape=(10,)).set_bounds(-5.12, 5.12)
opt = ng.optimizers.registry[algo](parametrization=param, budget=5000)
rec = opt.minimize(rastrigin)
results[algo].append(rastrigin(rec.value))

print(f"\{'Algorithm':15s\} \{'Mean':>10s\} \{'Std':>10s\} \{'Best':>10s\}")
print("-" * 50)
for algo in algos:
vals = results[algo]
print(f"\{algo:15s\} \{np.mean(vals):10.2f\} \{np.std(vals):10.2f\} "
f"\{np.min(vals):10.2f\}")
Algorithm            Mean        Std       Best
--------------------------------------------------
CMA 4.23 2.87 0.99
PSO 18.41 6.53 8.95
DE 12.85 5.91 3.98
OnePlusOne 42.17 11.24 22.33
RandomSearch 89.42 12.67 68.31

CMA-ES and DE lead on Rastrigin. But rankings do change across runs - Rastrigin is multimodal, so any single run might get unlucky. This is why Module 25 (Benchmarking) matters: always run multiple trials and report statistics, not single numbers.

Exercise 3 Solution
import cma
import numpy as np


def sphere(x: np.ndarray) -> float:
return float(np.sum(x**2))


dims = [2, 5, 10, 20, 50]
evals_needed = []

for d in dims:
es = cma.CMAEvolutionStrategy(
np.random.randn(d) * 5, 2.0,
\{"verbose": -1, "tolx": 1e-12, "tolfun": 1e-14\},
)
es.optimize(sphere)
evals_needed.append(es.result.evals)
print(f"Dim \{d:3d\}: \{es.result.evals:6d\} evaluations")
Dim   2:    200 evaluations
Dim 5: 850 evaluations
Dim 10: 2400 evaluations
Dim 20: 8500 evaluations
Dim 50: 48000 evaluations
Loading chart...

The relationship is roughly O(n^2). Going from 2D to 50D means ~240x more evaluations. The dashed line shows an n^2 reference. This is actually quite good compared to grid search (which scales exponentially), but it's why problems above ~100 dimensions need specialized large-scale variants like sep-CMA-ES (Module 08).

Exercise 5 Solution
  • Scheduling 50 jobs on 10 machines - Genetic Algorithm with permutation encoding. GAs handle combinatorial problems well with order crossover (OX) and swap mutation. ACO is also a strong choice for scheduling - it naturally handles the graph structure of job-machine assignments.

  • Finding a formula for (x, y) data - Genetic Programming. GP evolves tree-structured expressions and is specifically designed for symbolic regression - discovering mathematical formulas from data. Use gplearn for a quick start, or build your own tree GP in Module 09.

  • Optimizing 20 continuous noisy parameters - CMA-ES. Gold standard for continuous black-box optimization, and its population-based evaluation naturally smooths out noise. DE is a solid alternative, especially if the landscape is highly multimodal.

  • Designing a neural network architecture - Neuroevolution (NEAT/HyperNEAT) for small-medium networks, or evolutionary NAS for larger ones. The search space is discrete and structured (number of layers, types of layers, connections), which rules out gradient methods. This is covered in Modules 19-20.

  • Shortest route visiting 100 cities - Ant Colony Optimization. ACO was literally invented for TSP-like problems. The pheromone model naturally guides the colony through the graph of possible edges. GA with order crossover is a solid alternative, as covered in Module 03.


References

Footnotes

  1. Hornby, G.S. et al. (2011). "Automated Antenna Design with Evolutionary Algorithms." AIAA Space Conference. https://ti.arc.nasa.gov/tech/asr/gps/antenna/

  2. Thompson, A. (1997). "An Evolved Circuit, Intrinsic in Silicon, Entwined with Physics." Proc. 1st International Conference on Evolvable Systems (ICES). https://doi.org/10.1007/3-540-63173-9_61

  3. Holland, J.H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press. The foundational text on Genetic Algorithms. 2 3 4

  4. Rechenberg, I. (1973). Evolutionsstrategie. Frommann-Holzboog. The founding text of Evolution Strategies. See also Beyer & Schwefel (2002): https://doi.org/10.1023/A:1015059928466 2 3

  5. Koza, J.R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press. 2

  6. Storn, R. & Price, K. (1997). "Differential Evolution - A Simple and Efficient Heuristic for Global Optimization." Journal of Global Optimization, 11(4). https://doi.org/10.1023/A:1008202821328 2

  7. Kennedy, J. & Eberhart, R. (1995). "Particle Swarm Optimization." Proc. IEEE International Conference on Neural Networks. https://doi.org/10.1109/ICNN.1995.488968 2

  8. Dorigo, M. et al. (1996). "Ant System: Optimization by a Colony of Cooperating Agents." IEEE Trans. on Systems, Man, and Cybernetics. See also: Ant Colony Optimization (MIT Press, 2004). 2

  9. Stanley, K.O. & Miikkulainen, R. (2002). "Evolving Neural Networks through Augmenting Topologies." Evolutionary Computation, 10(2). https://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf 2

  10. Deb, K. et al. (2002). "A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II." IEEE Trans. on Evolutionary Computation, 6(2). https://doi.org/10.1109/4235.996017 2

  11. Mouret, J.-B. & Clune, J. (2015). "Illuminating search spaces by mapping elites." arXiv. https://arxiv.org/abs/1504.04909 2

  12. Engelbrecht, A.P. (2007). Computational Intelligence: An Introduction (2nd ed.). Wiley.

  13. Hansen, N. (2016). "The CMA Evolution Strategy: A Tutorial." arXiv. https://arxiv.org/abs/1604.00772 2 3 4

  14. Salimans, T. et al. (2017). "Evolution Strategies as a Scalable Alternative to Reinforcement Learning." arXiv. https://arxiv.org/abs/1703.03864

  15. Chen, Y. et al. (2023). "EvoPrompting: Language Models for Code-Level Neural Architecture Search." ICML. https://arxiv.org/abs/2302.14838

  16. Saha, S. & Gan, Z. (2021). "Evolutionary optimization in engineering design." BMW Group Research. https://doi.org/10.1007/978-981-15-3685-4

  17. Eiben, A.E. & Smith, J.E. (2015). Introduction to Evolutionary Computing (2nd ed.). Springer. https://link.springer.com/book/10.1007/978-3-662-44874-8

  18. Wolpert, D.H. & Macready, W.G. (1997). "No Free Lunch Theorems for Optimization." IEEE Trans. on Evolutionary Computation, 1(1). https://doi.org/10.1109/4235.585893

Was this page helpful?