Agent sandboxes

Hand your agent a full Linux machine and walk away. Each task runs in its own boxd VM with root, internet, a public HTTPS URL, and a 100 GB disk that survives whatever the agent does to it. If a run goes sideways, fork the pre-task state and try again.

How it works

Every boxd VM is a KVM microVM with its own kernel, network stack, and disk. Not a container. The agent can run Docker, install kernel modules, edit /etc, restart systemd, open ports, and break the OS without taking down anything else. Fresh boots take ~50ms. Forks land in ~160ms and inherit the parent’s exact disk, processes, and memory. Resume from suspend is sub-millisecond. So the loop “snapshot, hand off, fork on retry, destroy when done” actually feels instant. Inside the VM the agent has the boxd CLI on its PATH, pre-authenticated by source IP. It can create siblings, exec into them, manage proxies, and list VMs without needing a key or a token. JSON output everywhere so the agent can parse what it ran. That’s the whole pitch. The setup is one command.

Run a task

Boot a sandbox, hand a job to Claude Code non-interactively, get a structured result back:

boxd new --name=task-1 --json

RESULT=$(boxd exec task-1 --json \
  'claude -p --output-format json "Build a Flask API on port 8000 with /health" \
   --dangerously-skip-permissions 2>/dev/null')

SESSION_ID=$(echo "$RESULT" | jq -r '.output' | jq -r .session_id)

The agent’s working directory is on a 100 GB disk that persists across reboots. The build is live at https://task-1.boxd.sh the moment a port opens. To take over interactively, drop in and resume the session:

ssh task-1.boxd.sh
claude --resume "$SESSION_ID"

When you’re done, boxd destroy task-1 and the disk goes with it.

Patterns

Fork before risky ops

Snapshot the VM before the agent does something destructive. If the run fails, fork the parent again and retry. The parent never changes.

boxd fork task-1 --name=task-1-attempt-2 --json
boxd exec task-1-attempt-2 'claude -p "Apply the schema migration" --dangerously-skip-permissions'

Fan out across many VMs

Run the same task in parallel and pick the best output. Every result has its own URL.

for i in 1 2 3; do boxd new --name=try-$i --json & done; wait
for i in 1 2 3; do
  boxd exec try-$i "claude -p \"Implement option $i\" --dangerously-skip-permissions &"
done
# Results land at https://try-1.boxd.sh, try-2, try-3

Destroy on completion

Wire the destroy step into your PR-close hook or the agent’s exit path. You’re billed for what you run, so kill VMs you don’t need.

FAQ

What stops an agent from breaking out?

The microVM boundary is the same one your laptop’s hypervisor uses. The agent has root inside its VM, no path to the host or to other VMs. Internet egress is the only shared surface.

How many sandboxes can I run at once?

Ten VMs by default, extendable on request. Each gets 2 vCPU, 8 GiB RAM, 100 GB disk.

Can the agent install kernel modules or run Docker?

Yes. Real kernel, real systemd, real Docker. Nesting works because there’s no container in the way.

How do I recover a run that went wrong?

If you forked from a golden, just boxd destroy the bad fork and fork again. The golden is untouched.

Fork from a golden

Warm copies of your app in ~160ms. The sandbox source.

Fix-on-issue loop

The full end-to-end agent loop on GitHub issues.

​How it works

​Run a task

​Patterns

​Fork before risky ops

​Fan out across many VMs

​Destroy on completion

​FAQ

​Next