Skip to main content
Featured image for post: JEPA Valence Standalone Experiment

JEPA Valence Standalone Experiment

1 min

What I worked on

Thought about the critical path for using JEPA as a valence-based policy to guide action selection. If JEPA can’t represent the world model and doesn’t learn that energy exists then the policy can’t work.

I created a standalone JEPA trainer that uses a dataset built from a random walk policy. The hope is that this simple dataset covers enough transitions for JEPA to learn a useful representation of the environment and most importantly that energy exists.

whiteboard

What I noticed

  • My current probe is based on energy. I think that’s wrong. It should use near_food and can_eat features to learn energy is a consequence of exploring new states.

”Aha” Moment

n/a

What still feels messy

  • No guarantee that random walk gives the right combination of actions to survive long enough for a good representation
  • JEPA is being trained off-policy which pulls me further from my goal of continuous learning

Next step

  • Run the training and see if the learnt representation captures the environment well enough