Sandbox runtime for agents

Your coding agent runs anywhere — a laptop, a server, a serverless function — and plugs boxd in as the execution layer underneath it. Each task gets its own VM with real Linux, persistent disk, fork on retry, and sub-ms resume. The agent never has to live on the box; the box just shows up when work arrives.

Two ways to wire an agent

Both are first-class. Pick the one that matches how your agent is structured.

	Agent inside the VM	Sandbox runtime
Where the agent lives	On the VM	On your laptop, server, or function
How it talks to the VM	`claude` / `opencode` / `boxd` CLI on PATH	boxd SDK (TypeScript or Python)
One VM is	One agent session	One task / workspace / session
Best for	Interactive sessions, fan-out from inside	Pluggable backend in an agent framework
See	Agent sandboxes	This page

Why a per-task VM

Isolation. Real KVM microVM with its own kernel. The agent has root inside it and no path to the host or to other sandboxes. Run untrusted code, rm -rf /, kernel modules, nested Docker. None of it touches anything outside the box.
Persistence. The 100 GB disk survives across exec calls. Install the toolchain once; reuse it for the rest of the session. Suspend the VM between turns and the filesystem and running processes wait for the next prompt.
Sub-ms resume. Warm-suspended VMs wake instantly. The agent can park a task, do something else, and pick up where it left off without paying cold-start tax.
Fork on retry. Snapshot the workspace before a destructive step. If the model goes sideways, fork the parent and re-run. The parent is untouched.
Scale-out. ~50ms boot, ~160ms fork. Run a hundred agent tasks in parallel without queuing them through one machine.

What the integration looks like

The shape is small. One VM per unit of work, exec commands into it, optionally suspend between turns, destroy when done.

import { Compute } from "@boxd-sh/sdk";

const c = new Compute({ apiKey: process.env.BOXD_API_KEY });

// One VM per task / workspace / session
const box = await c.box.create({ name: `task-${id}` });

await box.exec(["bash", "-lc", "git clone ..."]);
await box.exec(["bash", "-lc", "npm test"]);

// Warm-suspend between agent turns -- sub-ms resume on the next call
await box.suspend();
await box.resume();

// Clean up
await box.destroy();

Python is identical in shape — see the Python SDK.

Patterns

One VM per workspace

Long-running, persistent. The workspace ID maps to a VM name; the VM warm-suspends between turns and resumes sub-ms when the next prompt arrives. The agent never sees the suspension. Repo state, installed deps, and running services all survive.

One VM per task

Short-lived. Fork from a golden when the task starts, run the agent, destroy on completion. The golden ships with the toolchain and the repo pre-installed so cold-start is ~160ms instead of “wait for npm install”.

Fork on retry

Snapshot before a destructive step. If the agent goes off the rails, fork the parent again and re-run with a different prompt. The parent is untouched and you can compare multiple attempts side-by-side at https://try-1.boxd.sh, https://try-2.boxd.sh, etc.

FAQ

How is this different from Agent sandboxes?

Agent sandboxes is the pattern where the agent itself runs INSIDE the VM (Claude Code, Codex, OpenCode CLI on the image, driven via boxd exec). This page is the pattern where the agent runs OUTSIDE and uses boxd as its execution backend. Same VMs, same primitives, different wiring.

Why not just run a container per task?

Containers share a kernel. An agent with root inside a container can install kernel modules, run systemd, nest Docker, or break the host. A microVM gives you all of that and contains it. Cold start is in the same ballpark too — 50ms boot, 160ms fork.

What about idle cost when the agent isn't running?

Auto-suspend kicks in after 30s of no inbound traffic (configurable). Suspended VMs cost near zero. Sub-ms resume when the next request arrives — the agent doesn’t notice the suspension.

How do I persist state across agent sessions?

Two options. Keep the VM warm-suspended (sub-ms resume, full state including running processes). Or fork from a golden each time (~160ms, gets a copy of whatever the golden has installed). Fork from a golden covers the second pattern.

How many VMs can I run in parallel?

Ten per account by default, extendable on request. Each gets 2 vCPU, 8 GiB RAM, 100 GB disk.

TypeScript SDK

Full SDK reference. Compute, Box, Proxy, Network.

Python SDK

Same SDK shape in Python, sync and async.

Suspend & resume

How warm-suspend and sub-ms wake actually work.

Fork from a golden

Per-task copies of a pre-installed app.

​Two ways to wire an agent

​Why a per-task VM

​What the integration looks like

​Patterns

​One VM per workspace

​One VM per task

​Fork on retry

​FAQ

​Next