Module 21: ES for RL & Policy Optimization - The OpenAI Bombshell
In 2017, OpenAI showed that a simple ES could match deep RL on MuJoCo tasks - with linear scaling across CPUs. No value functions, no replay buffers, no backpropagation through time. Just evolution. This module shows how and why.
Learning Objectives
- Understand policy optimization as black-box optimization
- Implement OpenAI ES with shared noise tables
- Code fitness shaping for variance reduction
- Compare ES vs PPO on simple control tasks
- Understand when ES beats gradient-based RL (and when it doesn't)
Concept Explanation
Coming soon.
Code Examples
Coming soon.
Exercises
Coming soon.
Milestone Checklist
- Implemented OpenAI ES for policy optimization
- Understand shared noise tables and communication efficiency
- Compared ES vs PPO
- Can explain the parallelization advantage of ES for RL
Was this page helpful?