Skip to main content
Your coding agent runs anywhere โ€” a laptop, a server, a serverless function โ€” and plugs boxd in as the execution layer underneath it. Each task gets its own VM with real Linux, persistent disk, fork on retry, and sub-ms resume. The agent never has to live on the box; the box just shows up when work arrives.

Two ways to wire an agent

Both are first-class. Pick the one that matches how your agent is structured.
Agent inside the VMSandbox runtime
Where the agent livesOn the VMOn your laptop, server, or function
How it talks to the VMclaude / opencode / boxd CLI on PATHboxd SDK (TypeScript or Python)
One VM isOne agent sessionOne task / workspace / session
Best forInteractive sessions, fan-out from insidePluggable backend in an agent framework
SeeAgent sandboxesThis page

Why a per-task VM

  • Isolation. Real KVM microVM with its own kernel. The agent has root inside it and no path to the host or to other sandboxes. Run untrusted code, rm -rf /, kernel modules, nested Docker. None of it touches anything outside the box.
  • Persistence. The 100 GB disk survives across exec calls. Install the toolchain once; reuse it for the rest of the session. Suspend the VM between turns and the filesystem and running processes wait for the next prompt.
  • Sub-ms resume. Warm-suspended VMs wake instantly. The agent can park a task, do something else, and pick up where it left off without paying cold-start tax.
  • Fork on retry. Snapshot the workspace before a destructive step. If the model goes sideways, fork the parent and re-run. The parent is untouched.
  • Scale-out. ~50ms boot, ~160ms fork. Run a hundred agent tasks in parallel without queuing them through one machine.

What the integration looks like

The shape is small. One VM per unit of work, exec commands into it, optionally suspend between turns, destroy when done.
import { Compute } from "@boxd-sh/sdk";

const c = new Compute({ apiKey: process.env.BOXD_API_KEY });

// One VM per task / workspace / session
const box = await c.box.create({ name: `task-${id}` });

await box.exec(["bash", "-lc", "git clone ..."]);
await box.exec(["bash", "-lc", "npm test"]);

// Warm-suspend between agent turns -- sub-ms resume on the next call
await box.suspend();
await box.resume();

// Clean up
await box.destroy();
Python is identical in shape โ€” see the Python SDK.

Patterns

One VM per workspace

Long-running, persistent. The workspace ID maps to a VM name; the VM warm-suspends between turns and resumes sub-ms when the next prompt arrives. The agent never sees the suspension. Repo state, installed deps, and running services all survive.

One VM per task

Short-lived. Fork from a golden when the task starts, run the agent, destroy on completion. The golden ships with the toolchain and the repo pre-installed so cold-start is ~160ms instead of โ€œwait for npm installโ€.

Fork on retry

Snapshot before a destructive step. If the agent goes off the rails, fork the parent again and re-run with a different prompt. The parent is untouched and you can compare multiple attempts side-by-side at https://try-1.boxd.sh, https://try-2.boxd.sh, etc.

FAQ

Agent sandboxes is the pattern where the agent itself runs INSIDE the VM (Claude Code, Codex, OpenCode CLI on the image, driven via boxd exec). This page is the pattern where the agent runs OUTSIDE and uses boxd as its execution backend. Same VMs, same primitives, different wiring.
Containers share a kernel. An agent with root inside a container can install kernel modules, run systemd, nest Docker, or break the host. A microVM gives you all of that and contains it. Cold start is in the same ballpark too โ€” 50ms boot, 160ms fork.
Auto-suspend kicks in after 30s of no inbound traffic (configurable). Suspended VMs cost near zero. Sub-ms resume when the next request arrives โ€” the agent doesnโ€™t notice the suspension.
Two options. Keep the VM warm-suspended (sub-ms resume, full state including running processes). Or fork from a golden each time (~160ms, gets a copy of whatever the golden has installed). Fork from a golden covers the second pattern.
Ten per account by default, extendable on request. Each gets 2 vCPU, 8 GiB RAM, 100 GB disk.

Next

https://mintcdn.com/azin/Ax1V0serIwQf0x_2/images/icons/typescript.svg?fit=max&auto=format&n=Ax1V0serIwQf0x_2&q=85&s=64245fab67d3a1e63744bc4e6c1f955b

TypeScript SDK

Full SDK reference. Compute, Box, Proxy, Network.
https://mintcdn.com/azin/Ax1V0serIwQf0x_2/images/icons/python.svg?fit=max&auto=format&n=Ax1V0serIwQf0x_2&q=85&s=50aa9d4f66d47baaef6fd6846b681b78

Python SDK

Same SDK shape in Python, sync and async.
https://mintcdn.com/azin/Ax1V0serIwQf0x_2/images/icons/moon-stars.svg?fit=max&auto=format&n=Ax1V0serIwQf0x_2&q=85&s=58518b62c4c44197c707ca00e1fd628e

Suspend & resume

How warm-suspend and sub-ms wake actually work.
https://mintcdn.com/azin/Ax1V0serIwQf0x_2/images/icons/copy.svg?fit=max&auto=format&n=Ax1V0serIwQf0x_2&q=85&s=f3623fe516eebf87b33b3a1022852286

Fork from a golden

Per-task copies of a pre-installed app.