Two ways to wire an agent
Both are first-class. Pick the one that matches how your agent is structured.| Agent inside the VM | Sandbox runtime | |
|---|---|---|
| Where the agent lives | On the VM | On your laptop, server, or function |
| How it talks to the VM | claude / opencode / boxd CLI on PATH | boxd SDK (TypeScript or Python) |
| One VM is | One agent session | One task / workspace / session |
| Best for | Interactive sessions, fan-out from inside | Pluggable backend in an agent framework |
| See | Agent sandboxes | This page |
Why a per-task VM
- Isolation. Real KVM microVM with its own kernel. The agent has root inside it and no path to the host or to other sandboxes. Run untrusted code,
rm -rf /, kernel modules, nested Docker. None of it touches anything outside the box. - Persistence. The 100 GB disk survives across exec calls. Install the toolchain once; reuse it for the rest of the session. Suspend the VM between turns and the filesystem and running processes wait for the next prompt.
- Sub-ms resume. Warm-suspended VMs wake instantly. The agent can park a task, do something else, and pick up where it left off without paying cold-start tax.
- Fork on retry. Snapshot the workspace before a destructive step. If the model goes sideways, fork the parent and re-run. The parent is untouched.
- Scale-out. ~50ms boot, ~160ms fork. Run a hundred agent tasks in parallel without queuing them through one machine.
What the integration looks like
The shape is small. One VM per unit of work, exec commands into it, optionally suspend between turns, destroy when done.Patterns
One VM per workspace
Long-running, persistent. The workspace ID maps to a VM name; the VM warm-suspends between turns and resumes sub-ms when the next prompt arrives. The agent never sees the suspension. Repo state, installed deps, and running services all survive.One VM per task
Short-lived. Fork from a golden when the task starts, run the agent, destroy on completion. The golden ships with the toolchain and the repo pre-installed so cold-start is ~160ms instead of โwait fornpm installโ.
Fork on retry
Snapshot before a destructive step. If the agent goes off the rails, fork the parent again and re-run with a different prompt. The parent is untouched and you can compare multiple attempts side-by-side athttps://try-1.boxd.sh, https://try-2.boxd.sh, etc.
FAQ
How is this different from Agent sandboxes?
How is this different from Agent sandboxes?
Agent sandboxes is the pattern where the agent itself runs INSIDE the VM (Claude Code, Codex, OpenCode CLI on the image, driven via
boxd exec). This page is the pattern where the agent runs OUTSIDE and uses boxd as its execution backend. Same VMs, same primitives, different wiring.Why not just run a container per task?
Why not just run a container per task?
Containers share a kernel. An agent with root inside a container can install kernel modules, run
systemd, nest Docker, or break the host. A microVM gives you all of that and contains it. Cold start is in the same ballpark too โ 50ms boot, 160ms fork.What about idle cost when the agent isn't running?
What about idle cost when the agent isn't running?
Auto-suspend kicks in after 30s of no inbound traffic (configurable). Suspended VMs cost near zero. Sub-ms resume when the next request arrives โ the agent doesnโt notice the suspension.
How do I persist state across agent sessions?
How do I persist state across agent sessions?
Two options. Keep the VM warm-suspended (sub-ms resume, full state including running processes). Or fork from a golden each time (~160ms, gets a copy of whatever the golden has installed). Fork from a golden covers the second pattern.
How many VMs can I run in parallel?
How many VMs can I run in parallel?
Ten per account by default, extendable on request. Each gets 2 vCPU, 8 GiB RAM, 100 GB disk.
Next
TypeScript SDK
Full SDK reference. Compute, Box, Proxy, Network.
Python SDK
Same SDK shape in Python, sync and async.
Suspend & resume
How warm-suspend and sub-ms wake actually work.
Fork from a golden
Per-task copies of a pre-installed app.