Understanding State-Space Models

What I worked on

Tried to understand the equations in the Mamba model and what h’(t) means in relation to h(t) and y(t). Looked at the difference between continuous and discrete-time models and how they process input sequences.

What I noticed

h’(t) defines the rate of change of hidden state
The first equation models state transitions, the second produces the output
Continuous-time models use differential equations, discrete ones use step-based updates
Implementation involves Conv1d layers and activation like SiLU
dt_proj bias uses inverse softplus to keep values stable

”Aha” Moment

That h’(t) is what lets the model evolve its state even without explicit history—it’s predicting change, not just current value.

What still feels messy

Still fuzzy on how discretization in Mamba is implemented mathematically and how it differs from an RNN’s step.

Next step

Step through the forward pass in the Mamba code with a simple phoneme example to see how state evolves frame by frame.

What I worked on

What I noticed

”Aha” Moment

What still feels messy

Next step

Command Palette

Choose Theme