SpeechBrain Architecture
• 1 min read 1 min
What I worked on
Explored SpeechBrain’s internal structure and pre-trained model APIs.
What I noticed
- Recipes are full experiment pipelines (data prep, training, eval).
- Lobes are modular components (models, layers, extractors) used across recipes.
- EncoderASR is encoder-only for feature extraction; EncoderDecoderASR handles full speech-to-text conversion.
”Aha” Moment
SpeechBrain is designed with modularity in mind — lobes for composable building blocks, recipes for reproducible experiments.
What still feels messy
How to adapt lobes or recipes for phoneme-level outputs when a direct model isn’t provided.
Next step
Experiment with feature extraction using EncoderASR and explore writing a minimal phoneme recognition recipe.