Skip to main content
Featured image for post: A (attempted) JEPA Valence Policy

A (attempted) JEPA Valence Policy

1 min

What I worked on

Created a new policy that uses JEPA to inform action selection. It doesn’t work at all…yet.

In theory…

  • JEPA will learn a valid next scene based on the set of actions available
  • Positive valence comes from the probe that learns moving forward provides access to food/energy
  • CMA-ES continues to black box optimize but is pushed in the direction of agent survival because total reward is higher when moving forward
flowchart TD

    %% --- JEPA Runtime ---
    subgraph JEPA["JEPA Module"]
        direction TB
        Zt["Encoder"]
        Zhat["Imagined Future"]
        Valid["Validity"]
        Delta["Energy Probe"]
    end

    %% --- Policy ---
    subgraph Policy["Policy"]
        direction TB
        Concat["Concat(Features, Valence Scores)"]
        MLP["MLP → logits"]
        Sample["Sample action"]
    end

    %% --- Flow ---
    s_t["Features"]
    s_t --> Zt
    Zt --> Zhat
    Zt --> Valid
    Zhat --> Valid
    Zhat --> Delta

    Valid --> Concat
    Delta --> Concat

    s_t --> Concat
    Concat --> MLP
    MLP --> Sample
    Sample --> Action["Action to Environment"]

What I noticed

  • Initially I had a local JEPA instance in the policy but that wasn’t learning anything. I switched to a global JEPA to get better representation across all generations

”Aha” Moment

n/a

What still feels messy

  • JEPA needs to see enough scenarios where moving forward increases energy. This is the same limitation CMA-ES has. The difference I hope is that JEPA is choosing actions based on an imagined future that maximizes energy not reward.

Next step

  • Come up with some experiments to prove this works