← All projects
Agentic AIlive

Autonomous AI Agent

A tool-using agent that plans, acts, observes and self-corrects.

AST-sandboxed Python · self-corrects · streamed trace

The problem

Most demo agents either crash on the first tool error or hide their reasoning. This one runs an explicit plan → act → observe loop, streams every reasoning step and tool call to the browser, reads tool errors as new context, and tries again — with a real-world hardening story around the Python sandbox.

Who this is for

Anyone building or evaluating LLM agents, especially around tool-use safety and observability.

Architecture

Plan-act-observe loop
Each iteration the LLM emits a short rationale + one tool call; the tool's output becomes the next observation.
Calculator tool
AST-parsed arithmetic — operators + math.* functions + constants only. No name lookups, no calls outside the whitelist.
run_python tool
Restricted Python sandbox. Defence in depth — AST validation rejects non-whitelisted imports, dunder access, exec / eval / open before any code runs; execution itself uses restricted builtins.
Streaming UI
Every reasoning step, tool call, and observation pushed to the browser as it happens. The agent's failures are visible.

Request / data flow

  1. 01User gives a task → LLM receives task + tool schemas.
  2. 02LLM emits reasoning + tool call → tool runs.
  3. 03Output (or error) is fed back as an observation.
  4. 04Loop continues up to a step budget; when the LLM responds without a tool call, that response is final.
  5. 05If a tool returned an error, the agent reads it on the next step and adjusts (e.g. code that printed nothing or threw).

Key decisions

AST validation before execution, not just at runtime.

whyCatching imports / dunder access at parse time is much harder to bypass than runtime checks alone.

Stream the entire reasoning trace.

whyAgents fail in interesting ways. Hiding the trace turns every failure into mystery; streaming it turns failures into debuggable evidence.

Errors are first-class context, not crashes.

whyThe agent's ability to recover from a wrong tool call is the actual capability worth demonstrating.

Stack

LLMAgentic AITool UseSandboxFastAPINext.js

If I rebuilt it

  • Add a planner step that emits a multi-step plan first, then executes — useful comparison against the current single-step loop.
  • Move the sandbox into a separate process with seccomp + cgroup limits instead of in-process restricted builtins.