- Rename docs/decisions/adr-012-lazar-conformance.md → adr-012-feature-conventions.md - Strip "Lazar", "Plan 8/9/10/11", "refactor-logs" refs from all ADRs, architecture docs, HTML explainers, and feature/core AGENTS.md files - Update all incoming links in docs/, packages/*/AGENTS.md, HTML explainers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
135 lines
7.2 KiB
Markdown
135 lines
7.2 KiB
Markdown
# ADR-019 — Sandcastle for Agent Orchestration
|
|
|
|
**Status:** Accepted
|
|
**Date:** 2026-05-13
|
|
**Spec:** docs/architecture/agent-first-workflow-and-conformance.md
|
|
**Companion guide:** docs/guides/runbook.md ("Using Sandcastle for agent dispatch")
|
|
**Related:** ADR-011 (TDD foundation), ADR-012 (feature conventions), ADR-015 (events and jobs)
|
|
|
|
## Context
|
|
|
|
This template is designed for **agent-driven feature development**. The conformance
|
|
system (ADR-012 + the post-ADR conformance-system-v1 epic) gives agents a tight,
|
|
layered feedback loop — type errors in 0s, lint in <1s, boot assertion in ~3s, CI
|
|
gates in ~120s. The remaining substrate question is: how does an agent actually
|
|
get dispatched against a task?
|
|
|
|
Three pieces are needed:
|
|
|
|
1. **A way to invoke an agent** (Claude / Codex) with a task description,
|
|
inside a sandbox so the agent can't break the host while iterating.
|
|
2. **A way to capture the agent's commits** so a reviewer agent can inspect
|
|
the diff and approve or reject.
|
|
3. **A way to compose the above into a per-task dispatch loop** with retry
|
|
semantics, branch management, and integration into the existing
|
|
docs/work/ task system.
|
|
|
|
Without a substrate that handles all three, agentic development falls back to
|
|
copy-paste-prompt-by-hand, which is slow and error-prone.
|
|
|
|
## Decision
|
|
|
|
Adopt [Sandcastle](https://github.com/mattpocock/sandcastle) (`@ai-hero/sandcastle`)
|
|
as the agent-orchestration substrate. `pnpm work dispatch` is the entry point.
|
|
|
|
Concretely:
|
|
|
|
1. **`@ai-hero/sandcastle` is a workspace-root devDependency.** Pinned at
|
|
`^2.73.0` at adoption; pnpm resolves later patches automatically.
|
|
2. **`.sandcastle/` holds the canonical prompt templates.** Five role-specific
|
|
prompts: PRD eliciter, ADR eliciter, decomposer, implementer, reviewer.
|
|
Each enforces the **generator-first** rule (prefer `pnpm turbo gen <kind>`
|
|
over hand-rolling — see saved memory `generator-first-for-agents`).
|
|
3. **`.sandcastle/Dockerfile`** is the sandbox baseline (node:22-bookworm-slim
|
|
- pnpm via corepack). The agent runs `pnpm install --frozen-lockfile` as
|
|
its first step per the implementer prompt.
|
|
4. **`scripts/work/dispatch.mjs` is the orchestrator.** It reads `_state.json`,
|
|
finds the first ready story's first unchecked AC bullet, builds a task spec,
|
|
and calls `sandcastle.run({ promptFile, promptArgs: { TASK_FILE_CONTENT } })`
|
|
for the implementer, then again for the reviewer with `{{DIFF}}`. The
|
|
orchestrator does NOT mutate state in v1 — it prints suggested mutations
|
|
for the human to apply.
|
|
5. **Two modes:** `pnpm work dispatch` (planning, no agent invoked) and
|
|
`pnpm work dispatch --execute` (real sandcastle call, requires auth — see
|
|
point 7).
|
|
6. **Reviewer agent verifies generator-first.** Hand-rolled output that should
|
|
have been a `pnpm turbo gen <kind>` invocation is grounds for rejection.
|
|
7. **Bring-your-own-auth.** Two paths are supported, in priority order:
|
|
- **Subscription (primary)** — bind-mount the host's `~/.claude/` into the
|
|
sandbox. Claude Code CLI inside the sandbox uses the host's logged-in
|
|
subscription session. Zero per-task token spend for Pro/Max subscribers.
|
|
Path overridable via `SANDCASTLE_CLAUDE_CREDS_DIR` env var.
|
|
- **API key (fallback)** — `ANTHROPIC_API_KEY` or `OPENAI_API_KEY` passed
|
|
through to the sandbox env. Used when no host creds directory exists.
|
|
- The resolver (`resolveClaudeAuth` in `scripts/work/dispatch.mjs`) picks
|
|
automatically with subscription always preferred. Sandcastle's own issue
|
|
#191 documents that subscription support won't be added natively;
|
|
this mount-based pattern is our workaround promoted to first-class.
|
|
8. **Per-task max-attempts honoured (v2).** Each task's frontmatter may carry
|
|
`max-attempts: N` to bound the implementer↔reviewer retry loop. Default 3.
|
|
|
|
## Alternatives considered
|
|
|
|
- **Bare Claude Code / Codex CLI invocation per task** — rejected. No sandbox
|
|
isolation; no consistent prompt template surface; no built-in branch
|
|
management; no reviewer-loop primitive.
|
|
- **GitHub Copilot Workspace / native CI agent** — rejected. Vendor lock-in;
|
|
workflow lives outside the repo; no local equivalent for development time.
|
|
- **Custom orchestrator built from scratch on the Anthropic SDK** — rejected.
|
|
Sandcastle already solves sandbox + branch + structured-output extraction;
|
|
rebuilding it is not the leverage point.
|
|
- **No orchestrator — humans dispatch each task manually via copy-paste** —
|
|
rejected as the steady-state mode, but supported as a fallback via planning
|
|
mode (`pnpm work dispatch` without `--execute`).
|
|
- **A different sandbox provider (Vercel sandboxes, Daytona, native fly.io)**
|
|
— sandcastle is provider-agnostic; the choice of provider sits behind the
|
|
`SANDCASTLE_PROVIDER` env var and can change without disrupting prompts or
|
|
orchestrator code. Default is Docker.
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Per-task isolation.** Each implementer dispatch runs in its own Docker
|
|
sandbox + sandbox branch. Bad agent output stays in the branch; merge to
|
|
`main` is gated by the reviewer agent + the full 5-gate stack.
|
|
- **Provider-agnostic.** Switching from Claude to Codex (or to a future
|
|
agent runtime) is a one-line change to the prompt's `agent` parameter.
|
|
- **Composable with existing workflow.** `pnpm work` CLI already reads
|
|
`_state.json` and the docs/work/ markdown; dispatch is one more subcommand
|
|
layered on top.
|
|
- **Cost-aware default.** Planning mode invokes no agent; only `--execute`
|
|
spends tokens. Operators choose when to escalate from plan to execute.
|
|
- **Recoverable failure modes.** If an implementer goes off-rails, its diff
|
|
lives on a sandbox branch — review, reject, re-dispatch with notes.
|
|
|
|
### Negative / accepted trade-offs
|
|
|
|
- **External dependency on sandcastle.** If the project stalls, we either pin
|
|
- maintain a fork or migrate to another orchestrator. Sandcastle is small
|
|
enough (~3KLOC) that a fork is manageable.
|
|
- **Token cost is real.** A complex task can use 100K-200K tokens per
|
|
implementer + reviewer round-trip. Operators budget per-dispatch; the
|
|
planning mode + the optional `max-attempts` frontmatter cap exposure.
|
|
- **Docker dependency for the default sandbox.** Without Docker (or a
|
|
provider swap), `--execute` won't run. Documented in the runbook.
|
|
- **State mutation is manual in v1.** The orchestrator prints suggested
|
|
state mutations; a human ticks the AC bullet + commits. Auto-mutation is
|
|
v2 work, gated on confidence that the reviewer's decision can be trusted
|
|
without human inspection.
|
|
|
|
### Follow-up work
|
|
|
|
- **Auto state mutation** — when the reviewer agent's decision is approve,
|
|
the orchestrator could automatically tick the AC bullet + commit. Currently
|
|
manual; promote when reviewer confidence is established empirically.
|
|
- **Multi-task batch dispatch** — `pnpm work dispatch --all-ready` would
|
|
fan out across all ready stories. Requires DAG-aware concurrency
|
|
(no two implementers touching the same files).
|
|
- **Sandcastle CI image alignment** — the `.sandcastle/Dockerfile` is
|
|
minimal; once we identify the CI base image, the sandbox should extend it
|
|
to match the CI environment exactly.
|
|
- **Cost telemetry** — `sandcastle.run()` returns iteration usage stats; the
|
|
orchestrator could log these to `_state.json` per-task so operators see
|
|
cumulative spend.
|