Skip to main content
Featured image for post: Structured Tooling for Sandboxed Agents

Structured Tooling for Sandboxed Agents

1 min

What I worked on

  • Tried running GEPA against real workloads which immediately required moving all execution into a sandbox, using tools, and returning artifacts that can be applied and evaluated
  • Experimented with a different approach to tool interaction. I want a typed, structured action surface that I think will be easier to reason over. I can leverage the compiler to enforce shape and get more control. The alternative would be a DSL or a pile of JSON glue
  • The agent uses the SDK as the action surface inside the sandbox and executes via C# script. This required a custom csx runner to preload dependencies and remove setup friction

Repo here

What I noticed

  • Still not possible to run GEPA end to end on a real workload. There’s a missing context layer the agent depends on
  • Reviewing generated code is starting to feel like the wrong bottleneck
  • For this setup it makes more sense to focus on what the user wants to do rather than how the agent implements it, so I’m experimenting with acceptance testing over code review or unit style checks

”Aha” Moment

N/A

What still feels messy

N/A

Next step

  • Introduce a minimal context layer so tool usage and execution paths can be validated in practice