Skip to main content
Featured image for post: PaperClub - Learning as Compilation

PaperClub - Learning as Compilation

2 min

What I worked on

Read a set of papers that all circle the same loop: experience → reflection → consolidation. Mix of RL, agents, long context, and fast adaptation methods.

TitleAuthorsYearLink
AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code HarnessLou et al.2026https://doi.org/10.48550/arXiv.2603.03329
Text-to-LoRA: Instant Transformer AdaptationCharakorn et al.2025https://doi.org/10.48550/arXiv.2506.06105
Experiential Reinforcement LearningShi et al.2026https://doi.org/10.48550/arXiv.2602.13949
Recursive Language ModelsZhang et al.2025https://doi.org/10.48550/arXiv.2512.24601
Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?Gloaguen et al.2026https://doi.org/10.48550/arXiv.2602.11988
Executable Code Actions Elicit Better LLM AgentsWang et al.2024https://doi.org/10.48550/arXiv.2402.01030

What I noticed

  • Different subfields converging on a similar pattern from different directions: Experience → Reflection → Consolidation → Deployment

    • RL calls it experiential learning
    • Agentic call it reflection
    • LLM calls it distillation or adapters
    • Systems people call it harnessing
  • AutoHarness: small model writes guardrail code or full policy that outperforms bigger models. Once compiled into code, you do not need the LLM at decision time.

  • ERL: explicit reflection step converts feedback into behavioral updates, then internalizes the improvement so inference stays cheap.

  • RLM: treat long context as environment, not input. The model navigates it instead of ingesting it.

  • Text-to-LoRA: train hypernetworks that convert task descriptions directly into adapters in one pass. Basically compile experience into weights without full fine tuning.

  • CodeAct: executable code is a better action space than JSON because it preserves state, control flow, composition. Actions become programs, not tokens.

  • AGENTS.md paper: extra instructions often hurt performance but increase exploration and testing.

Aha moment

  • Consolidation is just compilation-like. How do you bake insight into a cheaper representation for future use?
  • The bottleneck keeps showing up as context limits, and inference attention cost or training FT cost.
  • Everyone has invented a way to “compress the lesson” either into code, weights, policy, etc.

What still feels messy

  • It’s all about compilation, not higher quality learning.

Next step

  • Think about similarities between existing build systems and how they would self-optimize