Skip to main content

Understanding State-Space Models

1 min

What I worked on

Tried to understand the equations in the Mamba model and what h’(t) means in relation to h(t) and y(t). Looked at the difference between continuous and discrete-time models and how they process input sequences.

What I noticed

  • h’(t) defines the rate of change of hidden state
  • The first equation models state transitions, the second produces the output
  • Continuous-time models use differential equations, discrete ones use step-based updates
  • Implementation involves Conv1d layers and activation like SiLU
  • dt_proj bias uses inverse softplus to keep values stable

”Aha” Moment

That h’(t) is what lets the model evolve its state even without explicit history—it’s predicting change, not just current value.

What still feels messy

Still fuzzy on how discretization in Mamba is implemented mathematically and how it differs from an RNN’s step.

Next step

Step through the forward pass in the Mamba code with a simple phoneme example to see how state evolves frame by frame.