CMA ES for Agent Training

What I worked on

Studied how CMA-ES can train an agent without backpropagation by optimizing a fitness function. Looked at how a linear policy maps features to actions and how it behaves after training.

What I noticed

The agent always moved forward and never ate meaning a poor policy
CMA-ES optimizes a parameter vector directly, no neural net required
Policies are encoded as weight and bias vectors

”Aha” Moment

That CMA-ES treats the policy as a parameter vector and evolves it purely by measuring fitness outcomes.

What still feels messy

How to design a reward or feature set that encourages meaningful exploration rather than simple repetitive behavior.

Next step

Modify the environment to only reward time alive and expose a “near-food” feature

What I worked on

What I noticed

”Aha” Moment

What still feels messy

Next step

Command Palette

Choose Theme