Sandboxing AI Coding Agents: A Raspberry Pi, a Dispatch API, and the Case for Separation

I've been building Agent GTD, a task management system designed around AI agents. One of its features is a "Send to Claude" button — click it, and a headless Claude Code agent clones the repo, works on the task autonomously, pushes a feature branch, and posts its results as a comment. No human in the loop until review time.

It worked. But it had a problem: the agent ran on the same server as the web app.

The security problem

When you run a headless AI coding agent on your production server, that agent inherits your world. Same SSH keys that access every machine on the network. Same filesystem with production configs and secrets. Same database connection. The agent is sandboxed by Claude Code's own guardrails and a carefully worded system prompt — but "the system prompt says not to" is not a security boundary.

The threat model isn't that Claude goes rogue. It's more mundane: prompt injection via a compromised dependency, a malicious CLAUDE.md in a cloned repo, or just an agent that makes an honest mistake with access it shouldn't have had. The blast radius of a coding agent should be limited to what it needs: clone a repo, write code, run tests, push a branch. Nothing else.

The solution: a separate machine

I had a Raspberry Pi 5 sitting in a Pironman 5 Max case — an aluminum enclosure with NVMe storage and active tower cooling that keeps the Pi at 39°C under full CPU load. It runs Claude Code well. It became the dispatch worker.

The architecture is straightforward:

graph TD
    A["Web server
(GTD backend)
- web app
- database"] -- "POST /dispatch
HTTPS" --> B
    B -- "comments + status" --> A
    B["Raspberry Pi 5
(dispatch API)
user: dispatch
- no sudo
- restricted SSH
- claude CLI"] -- "git clone/push" --> C["Git server
(bare repos)"]

The GTD backend calls POST /dispatch on the Pi. The Pi's dispatch API — a small FastAPI service — clones the repo, runs Claude Code with a system prompt describing the task, and reports back. The backend polls for status and publishes real-time events to the UI.

What the dispatch user can and can't do

The Pi runs a dedicated dispatch user with:

No sudo. No password. Can't escalate.
A restricted SSH key authorized on the git server, which runs git-shell — a shell that only allows git operations. No interactive login, no port forwarding, no tunneling.
No database access. The dispatch user talks to the GTD backend over HTTPS to post comments. It never touches the production database directly.
Claude Code with --print mode — non-interactive, permission-skipping, turn-limited.

The blast radius: a compromised agent can push a branch (which goes through code review) and post a comment (which is visible and auditable). That's it.

The infrastructure work

Getting here required migrating the entire git hosting setup. Previously, bare repos lived under my personal user account on a VM — the same SSH key that cloned repos could also log in and do anything. That had to change.

I created a dedicated git user with git-shell on the VM. Migrated 37 bare repos. Updated remotes on six machines across the LAN. Wrote scripts for the migration so it was repeatable. Stumbled through SSH agent forwarding issues, DNS resolution failures, and a bash set -e gotcha where ((counter++)) silently exits when the counter is zero (exit code 1 from a falsy arithmetic expression — classic).

The migration was the kind of work that has to be done interactively. I was SSH-ing into machines, debugging host key mismatches, updating systemd services, testing connectivity. An autonomous agent couldn't have navigated this — too many machines, too many judgment calls, too much context about my specific network.

When to use conversation vs. dispatch

This is the part I didn't expect to be the most important takeaway.

Building the dispatch system was a full-day interactive session. Debugging, architecture decisions, jumping between machines, adapting to things that didn't work. That's conversation work — it requires a human and an AI agent working together in real time, with the human providing context the agent can't discover on its own.

But once the system was built, I dispatched a test task: "Add tests for these uncovered lines in mcp_backend.py." The agent failed. Three times. First at 20 turns, then at 50 turns. It spent all its budget reading the codebase and never got to writing tests.

The problem wasn't the turn count. The problem was the spec.

"Add tests for 50 uncovered lines" is a bad ticket for a junior developer and it's a bad task for an autonomous agent. There's too much ambiguity, too much exploration required, too many decisions about what to test and how. The agent thrashed.

What works is what has always worked for delegating to capable but context-limited team members: clear scope, specific files, patterns to follow, and an explicit definition of done. "Shovel ready" for an AI agent means:

What to do — specific, not vague
Where to look — file paths, existing examples to copy
How to verify — which command to run, expected output
Boundaries — what NOT to touch

This realization reshapes how I think about working with AI agents. The interactive session shouldn't be where I build everything start to finish. It should be where I plan — develop ideas, make architectural decisions, write specs. Then dispatch the execution to autonomous agents working in parallel on isolated branches.

Interactive sessions become strategic. Dispatch handles the grunt work. The human is a tech lead, not a pair programmer.

What's next

The dispatch mechanism works. The isolation is solid. The next challenge is making it easier to write good specs — structured templates, two-pass dispatch where the agent proposes a plan before executing, batch dispatch for parallel work. The tooling needs to make the planning-to-execution handoff as smooth as the dispatch button itself.

The rest is prompt engineering.