Skip to main content
Featured image for post: Paper Club - Intrinsic Motivation For RL

Paper Club - Intrinsic Motivation For RL

2 min

What I worked on

Read a few papers on curiosity and intrinsic motivation.

TitleAuthorsYearLink
Intrinsic Motivation For Reinforcement Learning SystemsBarto, Andrew G.n.d.
Learning to Play with Intrinsically-Motivated Self-Aware AgentsHaber, Nick; Mrowca, Damian; Fei-Fei, Li; Yamins, Daniel LK2018https://doi.org/10.48550/arXiv.1802.07442
Computational Theories of Curiosity-Driven LearningOudeyer, Pierre-Yves2018https://doi.org/10.48550/arXiv.1802.10546
Intrinsic Motivation Systems for Autonomous Mental DevelopmentOudeyer, Pierre-Yves; Kaplan, Frédéric; Hafner, Verena V.2007https://doi.org/10.1109/TEVC.2006.890271
How Can We Define Intrinsic Motivation?Oudeyer, Pierre-Yves; Kaplan, Frédéricn.d.
Curiosity-Driven Exploration by Self-Supervised PredictionPathak, Deepak; Agrawal, Pulkit; Efros, Alexei A.; Darrell, Trevor2017https://doi.org/10.48550/arXiv.1705.05363
Don’t Do What Doesn’t Matter: Intrinsic Motivation with Action UsefulnessSeurin, Mathieu; Strub, Florian; Preux, Philippe; Pietquin, Olivier2021https://doi.org/10.48550/arXiv.2105.09992

What I noticed

  • I like the idea of the agent learning about the environment dynamics vs comparing ∞ states. Actions are finite so it scales.
  • Curiosity-based exploration has been historically about a forward model f(st,at)=st+1f(s_t,a_t)=s_{t+1} that tries to predict the next state/features. A high prediction error is used as the intrinsic reward because it means something surprising happened.
  • Memory didn’t come up much but I like it.
    • Store vectors (this allows for easy similarity comparisons)
    • Count actions
  • Learnt about an “inverse model” at=finv(st,st+1)a_t=f_{inv}(s_t,s_{t+1}) to predict the action that caused the transition.
  • Curiosity is “an efficient way to bootstrap learning when there is no information” - Oudeyer
  • World models seem to be the backbone for predicting action consequences. Love how this ties back into my earlier work.

”Aha” Moment

  • n/a

What still feels messy

  • Barto says all reward is intrinsic if you move the critic from the external environment into the agent’s internal environment. I’m not sure I believe that in the general case but it’s true enough for my scenario.
  • Is curiosity expressing positive or negative valence? Need to think about this more.

Next step

  • Come up with a new policy based on these ideas
  • If I can add JEPA as a way to represent state that would be interesting. BUT JEPA requires SGD and I’m using ES, and I want dynamics not representation.