PaperClub - Learning as Compilation
What I worked on
Read a set of papers that all circle the same loop: experience → reflection → consolidation. Mix of RL, agents, long context, and fast adaptation methods.
| Title | Authors | Year | Link |
|---|---|---|---|
| AutoHarness: Improving LLM Agents by Automatically Synthesizing a Code Harness | Lou et al. | 2026 | https://doi.org/10.48550/arXiv.2603.03329 |
| Text-to-LoRA: Instant Transformer Adaptation | Charakorn et al. | 2025 | https://doi.org/10.48550/arXiv.2506.06105 |
| Experiential Reinforcement Learning | Shi et al. | 2026 | https://doi.org/10.48550/arXiv.2602.13949 |
| Recursive Language Models | Zhang et al. | 2025 | https://doi.org/10.48550/arXiv.2512.24601 |
| Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? | Gloaguen et al. | 2026 | https://doi.org/10.48550/arXiv.2602.11988 |
| Executable Code Actions Elicit Better LLM Agents | Wang et al. | 2024 | https://doi.org/10.48550/arXiv.2402.01030 |
What I noticed
-
Different subfields converging on a similar pattern from different directions: Experience → Reflection → Consolidation → Deployment
- RL calls it experiential learning
- Agentic call it reflection
- LLM calls it distillation or adapters
- Systems people call it harnessing
-
AutoHarness: small model writes guardrail code or full policy that outperforms bigger models. Once compiled into code, you do not need the LLM at decision time.
-
ERL: explicit reflection step converts feedback into behavioral updates, then internalizes the improvement so inference stays cheap.
-
RLM: treat long context as environment, not input. The model navigates it instead of ingesting it.
-
Text-to-LoRA: train hypernetworks that convert task descriptions directly into adapters in one pass. Basically compile experience into weights without full fine tuning.
-
CodeAct: executable code is a better action space than JSON because it preserves state, control flow, composition. Actions become programs, not tokens.
-
AGENTS.md paper: extra instructions often hurt performance but increase exploration and testing.
Aha moment
- Consolidation is just compilation-like. How do you bake insight into a cheaper representation for future use?
- The bottleneck keeps showing up as context limits, and inference attention cost or training FT cost.
- Everyone has invented a way to “compress the lesson” either into code, weights, policy, etc.
What still feels messy
- It’s all about compilation, not higher quality learning.
Next step
- Think about similarities between existing build systems and how they would self-optimize