A (attempted) JEPA Valence Policy
• 1 min read 1 min
What I worked on
Created a new policy that uses JEPA to inform action selection. It doesn’t work at all…yet.
In theory…
- JEPA will learn a valid next scene based on the set of actions available
- Positive valence comes from the probe that learns moving forward provides access to food/energy
- CMA-ES continues to black box optimize but is pushed in the direction of agent survival because total reward is higher when moving forward
flowchart TD
%% --- JEPA Runtime ---
subgraph JEPA["JEPA Module"]
direction TB
Zt["Encoder"]
Zhat["Imagined Future"]
Valid["Validity"]
Delta["Energy Probe"]
end
%% --- Policy ---
subgraph Policy["Policy"]
direction TB
Concat["Concat(Features, Valence Scores)"]
MLP["MLP → logits"]
Sample["Sample action"]
end
%% --- Flow ---
s_t["Features"]
s_t --> Zt
Zt --> Zhat
Zt --> Valid
Zhat --> Valid
Zhat --> Delta
Valid --> Concat
Delta --> Concat
s_t --> Concat
Concat --> MLP
MLP --> Sample
Sample --> Action["Action to Environment"]
What I noticed
- Initially I had a local JEPA instance in the policy but that wasn’t learning anything. I switched to a global JEPA to get better representation across all generations
”Aha” Moment
n/a
What still feels messy
- JEPA needs to see enough scenarios where moving forward increases energy. This is the same limitation CMA-ES has. The difference I hope is that JEPA is choosing actions based on an imagined future that maximizes energy not reward.
Next step
- Come up with some experiments to prove this works