Structured Tooling for Sandboxed Agents

What I worked on

Tried running GEPA against real workloads which immediately required moving all execution into a sandbox, using tools, and returning artifacts that can be applied and evaluated
Experimented with a different approach to tool interaction. I want a typed, structured action surface that I think will be easier to reason over. I can leverage the compiler to enforce shape and get more control. The alternative would be a DSL or a pile of JSON glue
The agent uses the SDK as the action surface inside the sandbox and executes via C# script. This required a custom csx runner to preload dependencies and remove setup friction

Repo here

Still not possible to run GEPA end to end on a real workload. There’s a missing context layer the agent depends on
Reviewing generated code is starting to feel like the wrong bottleneck
For this setup it makes more sense to focus on what the user wants to do rather than how the agent implements it, so I’m experimenting with acceptance testing over code review or unit style checks

N/A

N/A

Introduce a minimal context layer so tool usage and execution paths can be validated in practice