Technical companion to /projects/agent. For someone designing tool-using LLM agents and worried about both safety and debuggability.
I Use This When...
I want a tool-using agent whose entire reasoning trace I can read — and a Python sandbox that doesn't crash the box when the LLM goes off-script.
Why plan-act-observe, not single completion
A single completion can't ground itself on output it hasn't seen yet.
The observation step lets the model self-correct on errors — if
run_python returned a NameError, the next step's prompt includes
that error, so the LLM rewrites and retries instead of giving up.
The loop has a hard ceiling (MAX_STEPS = 8 in api.py) so a runaway
LLM can't burn the rate-limit on a single task. When the model
responds without a tool call, that response is the final answer.
Why AST validation before runtime, not only at runtime
sandbox.py validates the user's Python at parse time, before any
code runs:
- Imports outside an allowlist (
math,random,statistics,datetime,itertools,json,re,collections,fractions,decimal) are rejected. import fromis checked the same way.- Any attribute access starting with
_is rejected — that kills the classic().__class__.__base__.__subclasses__()escape chain. - A name allowlist + a banned-names list (
exec,eval,compile,open,__import__,globals,locals,vars,getattr,setattr,delattr,breakpoint,memoryview,object,super) blocks reflection and IO primitives.
Runtime-only checks are easier to bypass — e.g. monkey-patching builtins before they're called. AST-time rejection is the layer that has to parse-fail every escape attempt before runtime gets a chance to recover.
Why two layers of defence in depth on run_python
The sandbox is invoked as a subprocess (subprocess/SANDBOX_PATH)
so even if the in-process restrictions are escaped, the agent's
process tree limits — wall-clock timeout, CPU time, address space
when run as __main__ — bound the blast radius. The agent process
also runs the validation; the subprocess re-validates. Two layers
mean the threat model is "find a bypass that works against the AST
validator and the restricted builtins and the OS-level limits".
Why stream the entire reasoning trace
Agents fail in interesting ways. Hiding the trace turns every failure into a black box that the developer has to reproduce. Streaming each reasoning step, tool call, and observation as they happen turns failures into evidence — the reviewer sees exactly which step went sideways and what observation followed.
Why errors are first-class context, not crashes
A tool error becomes the next observation, not an exception. That
forces the agent to actually read what went wrong before its next
move. "Can the agent recover from being told its last code raised
NameError: name 'answer' is not defined?" is the capability worth
demonstrating; "does the agent never make mistakes?" is not.
What broke first
TODO: the first version's
run_pythonreturned variable bindings but not stdout. The LLM wroteprint(answer)and the observation wasNone. Add the exact fix (probablycontextlib.redirect_stdoutover anio.StringIO).
What I'd rebuild
- Move the sandbox into a separate process with
seccompfilters andcgroupresource limits, not just in-process restricted builtins. - Add a planner step that emits a multi-step plan first, then executes — useful comparison against the current single-step loop.
- Cap tool output length explicitly so a runaway print can't blow the next prompt's context window.