Understanding State-Space Models
• 1 min read 1 min
What I worked on
Tried to understand the equations in the Mamba model and what h’(t) means in relation to h(t) and y(t). Looked at the difference between continuous and discrete-time models and how they process input sequences.
What I noticed
- h’(t) defines the rate of change of hidden state
- The first equation models state transitions, the second produces the output
- Continuous-time models use differential equations, discrete ones use step-based updates
- Implementation involves Conv1d layers and activation like SiLU
- dt_proj bias uses inverse softplus to keep values stable
”Aha” Moment
That h’(t) is what lets the model evolve its state even without explicit history—it’s predicting change, not just current value.
What still feels messy
Still fuzzy on how discretization in Mamba is implemented mathematically and how it differs from an RNN’s step.
Next step
Step through the forward pass in the Mamba code with a simple phoneme example to see how state evolves frame by frame.