From 3f0d60e0828e29e277e9c3feb977839440676029 Mon Sep 17 00:00:00 2001 From: Danijel Martinek Date: Wed, 13 May 2026 09:15:13 +0200 Subject: [PATCH] =?UTF-8?q?docs(adr):=20ADR-019=20=E2=80=94=20Sandcastle?= =?UTF-8?q?=20for=20agent=20orchestration?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures the decision to adopt @ai-hero/sandcastle as the orchestration substrate for agent-driven development in this template. Records the 8-point decision (workspace dep, .sandcastle/ prompts, Dockerfile, dispatch.mjs orchestrator, planning vs execute modes, generator-first reviewer check, bring-your-own-key, per-task max-attempts), the four alternatives considered (bare CLI / Copilot Workspace / custom-from- scratch / no orchestrator), and four trade-offs (external dep, token cost, Docker dependency, manual state mutation in v1). Surfaces the decision at the top of README.md and AGENTS.md so new contributors see the agent-driven framing before they hit the package map or daily commands. --- AGENTS.md | 118 ++++++++++------- README.md | 4 +- ...-019-sandcastle-for-agent-orchestration.md | 125 ++++++++++++++++++ 3 files changed, 201 insertions(+), 46 deletions(-) create mode 100644 docs/decisions/adr-019-sandcastle-for-agent-orchestration.md diff --git a/AGENTS.md b/AGENTS.md index 5e76dc9..d9d7194 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,26 +2,37 @@ This is a **Turborepo + pnpm monorepo** organized by vertical features. Each feature package owns its own Clean Architecture layers (entities, application, infrastructure, interface-adapters) and integrations (CMS collections, tRPC routers, UI components). Core packages provide foundation: primitives, design system, CMS composition, API aggregation, and tRPC client platform. +## Agent-driven development + +This template assumes agents (Claude, Codex, etc.) will author most feature work. The orchestration substrate is [Sandcastle](https://github.com/mattpocock/sandcastle) — see [ADR-019](./docs/decisions/adr-019-sandcastle-for-agent-orchestration.md). Day-to-day entry points: + +- `pnpm work next` / `ready` / `blocked` — DAG-aware task selection from `docs/work/` +- `pnpm work dispatch` — print the next dispatch plan (planning mode, no agent invoked) +- `pnpm work dispatch --execute` — invoke sandcastle (requires `ANTHROPIC_API_KEY`) +- `.sandcastle/` — 5 prompt templates (PRD eliciter, ADR eliciter, decomposer, implementer, reviewer); all enforce **generator-first** (`pnpm turbo gen ` over hand-rolling) + +Every feature has a `src/feature.manifest.ts` declaring its use cases. Every `bindProductionX(ctx)` and `bindDevSeedX(ctx)` self-asserts at its tail via `assertFeatureConformance(...)`. Five conformance gates catch drift at four latency tiers: TypeScript (0s), ESLint (<1s), boot (~3s), CI (`pnpm conformance`, `pnpm fallow`). See `docs/guides/conformance-quickref.md` and `docs/guides/runbook.md` for the full workflow. + --- ## Package Map -| Package | Tag | Purpose | -|---|---|---| -| `@repo/core-shared` | core | Generic primitives (Zod, env, Payload hooks/fields/blocks, tRPC init/context) | -| `@repo/core-ui` | core | Design system (atoms, molecules, generic organisms, templates) — **optional**, scaffold via `pnpm turbo gen core-package ui` | -| `@repo/core-audit` | core | DPA-compliant audit logging (4 impls, GDPR erasure, OTel correlation) — **optional**, scaffold via `pnpm turbo gen core-package audit` | -| `@repo/core-api` | core-composition | tRPC router aggregator — imports `@repo//api` only | -| `@repo/core-cms` | core-composition | Payload config aggregator — imports `@repo//cms` only | -| `@repo/core-trpc` | core-composition | Frontend tRPC client + framework-specific providers (Next.js, TanStack) | -| `@repo/auth` | feature | Users collection + sign-in/up/out | -| `@repo/blog` | feature | Articles collection + article use-cases | -| `@repo/media` | feature | Media collection + upload helpers | -| `@repo/marketing-pages` | feature | Pages collection + SiteSettings global | -| `@repo/navigation` | feature | Header global | -| `@repo/core-eslint` | tooling | Shared ESLint 9 flat configs (base, next, react-internal, boundaries) | -| `@repo/core-typescript` | tooling | Shared TypeScript base configs + Vitest base | -| `@repo/core-testing` | tooling | Shared test utilities (defineFactory, defineContractSuite, renderWithProviders, payload mocks) | +| Package | Tag | Purpose | +| ----------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------- | +| `@repo/core-shared` | core | Generic primitives (Zod, env, Payload hooks/fields/blocks, tRPC init/context) | +| `@repo/core-ui` | core | Design system (atoms, molecules, generic organisms, templates) — **optional**, scaffold via `pnpm turbo gen core-package ui` | +| `@repo/core-audit` | core | DPA-compliant audit logging (4 impls, GDPR erasure, OTel correlation) — **optional**, scaffold via `pnpm turbo gen core-package audit` | +| `@repo/core-api` | core-composition | tRPC router aggregator — imports `@repo//api` only | +| `@repo/core-cms` | core-composition | Payload config aggregator — imports `@repo//cms` only | +| `@repo/core-trpc` | core-composition | Frontend tRPC client + framework-specific providers (Next.js, TanStack) | +| `@repo/auth` | feature | Users collection + sign-in/up/out | +| `@repo/blog` | feature | Articles collection + article use-cases | +| `@repo/media` | feature | Media collection + upload helpers | +| `@repo/marketing-pages` | feature | Pages collection + SiteSettings global | +| `@repo/navigation` | feature | Header global | +| `@repo/core-eslint` | tooling | Shared ESLint 9 flat configs (base, next, react-internal, boundaries) | +| `@repo/core-typescript` | tooling | Shared TypeScript base configs + Vitest base | +| `@repo/core-testing` | tooling | Shared test utilities (defineFactory, defineContractSuite, renderWithProviders, payload mocks) | --- @@ -37,13 +48,13 @@ This is a **Turborepo + pnpm monorepo** organized by vertical features. Each fea ### Allowed dependency directions -| Tag | May depend on | -|---|---| -| app | app, core, core-composition, feature, tooling | -| core-composition | core, core-composition, feature, tooling | -| core | core, core-composition, tooling | -| feature | core, tooling | -| tooling | tooling | +| Tag | May depend on | +| ---------------- | --------------------------------------------- | +| app | app, core, core-composition, feature, tooling | +| core-composition | core, core-composition, feature, tooling | +| core | core, core-composition, tooling | +| feature | core, tooling | +| tooling | tooling | ### Composition exceptions @@ -254,8 +265,10 @@ Void controllers (e.g. `signOutController`, `deleteMediaController`) return `Pro DI binds each factory with `.toDynamicValue()`: ```typescript -bind(BLOG_SYMBOLS.IGetArticlesUseCase) - .toDynamicValue((ctx) => getArticlesUseCase(ctx.container.get(BLOG_SYMBOLS.IArticlesRepository))); +bind(BLOG_SYMBOLS.IGetArticlesUseCase).toDynamicValue( + (ctx) => + getArticlesUseCase(ctx.container.get(BLOG_SYMBOLS.IArticlesRepository)), +); ``` ### Feature-scoped tRPC error mapping (Plan 9, R13–R17) @@ -283,14 +296,14 @@ The router then uses `blogProcedure.input(xInputSchema)` for every procedure — Each feature package exposes exactly these subpath exports: -| Subpath | What it exports | Who consumes | -|---|---|---| -| `.` (root) | Contracts only: types, errors, schemas, `IUseCase` / `IController` aliases, router type, constants | Any consumer | -| `./ui` | Query builders (`queryOptions`), UI components | App packages | -| `./api` | tRPC router (`xRouter` + `XRouter` type) | `@repo/core-api` only | -| `./cms` | Payload collections | `@repo/core-cms` only | -| `./di/bind-production` | App boot side-effect — swaps mock for real Payload impl | App packages only | -| `./di/bind-dev-seed` | App boot side-effect — swaps empty mock for populated mock | App packages, storybook | +| Subpath | What it exports | Who consumes | +| ---------------------- | -------------------------------------------------------------------------------------------------- | ----------------------- | +| `.` (root) | Contracts only: types, errors, schemas, `IUseCase` / `IController` aliases, router type, constants | Any consumer | +| `./ui` | Query builders (`queryOptions`), UI components | App packages | +| `./api` | tRPC router (`xRouter` + `XRouter` type) | `@repo/core-api` only | +| `./cms` | Payload collections | `@repo/core-cms` only | +| `./di/bind-production` | App boot side-effect — swaps mock for real Payload impl | App packages only | +| `./di/bind-dev-seed` | App boot side-effect — swaps empty mock for populated mock | App packages, storybook | Apps import schemas/types from `@repo/` (root) and React Query builders from `@repo//ui`. Deep source paths are not accessible — the `exports` map enforces this. @@ -304,7 +317,10 @@ Feature packages that need Payload receive the `SanitizedConfig` via constructor export class ArticlesRepository implements IArticlesRepository { constructor(private config: SanitizedConfig) {} - async getArticles(options?: { status?: string; limit?: number }): Promise { + async getArticles(options?: { + status?: string; + limit?: number; + }): Promise { const payload = await getPayload({ config: this.config }); // ... } @@ -350,8 +366,17 @@ export async function bindAllProduction(): Promise { export async function bindAllDevSeed(): Promise { const { tracer, logger } = resolveInstrumentation(); - const ctx: BindContext = { - tracer, logger, bus, queue, realtime, realtimeRegistry, + const ctx: BindContext< + IEventBus, + IRealtimeBroadcaster, + IRealtimeHandlerRegistry + > = { + tracer, + logger, + bus, + queue, + realtime, + realtimeRegistry, }; await bindDevSeedAuth(ctx); @@ -387,8 +412,8 @@ See `docs/guides/conformance-quickref.md` for the canonical pattern; the generat Three rules: - **E0:** Events are for cross-feature decoupling. In-feature reactions are direct use-case calls — do not use the bus. -- **E1:** Event contracts are exported from the publisher's root; handlers are private to the consumer's bind-* files (never re-exported, ESLint-enforced). -- **J0:** Jobs are for *deferred* work, not abstraction. Synchronous code stays synchronous. +- **E1:** Event contracts are exported from the publisher's root; handlers are private to the consumer's bind-\* files (never re-exported, ESLint-enforced). +- **J0:** Jobs are for _deferred_ work, not abstraction. Synchronous code stays synchronous. `@repo/core-events` provides `IEventBus` (`InMemoryEventBus` for dev/test, `PayloadJobsEventBus` for prod). `@repo/core-shared/jobs` provides `IJobQueue` (`InMemoryJobQueue` / `PayloadJobQueue`). Both are swapped by `bindAll()` using the same `USE_DEV_SEED` / `NODE_ENV` rules as repositories. @@ -405,7 +430,7 @@ See `docs/guides/events-and-jobs.md` and `docs/decisions/adr-015-events-and-jobs Three rules: - **R0:** Realtime is for state delivery, not for replacing tRPC. Persistent operations with request/response semantics belong on tRPC procedures. Use realtime when the server needs to push without a request, or the data is too high-frequency for HTTP. -- **R1:** Channel descriptors are exported; handlers are private. A feature's `realtime/.channel.ts` is re-exported from the package root barrel; `realtime/handlers/*.handler.ts` is wired only in the feature's own bind-* files and never re-exported (ESLint-enforced via `no-realtime-handler-reexport`). +- **R1:** Channel descriptors are exported; handlers are private. A feature's `realtime/.channel.ts` is re-exported from the package root barrel; `realtime/handlers/*.handler.ts` is wired only in the feature's own bind-\* files and never re-exported (ESLint-enforced via `no-realtime-handler-reexport`). - **R2:** `socket.io` lives in one package only. Feature packages MUST NOT `import "socket.io"` or `import "socket.io-client"`. Allowlist: `packages/core-realtime/src/socket-io-*.ts` + `apps/*/server.ts`. ESLint rule `no-direct-socket-io` enforces this. `@repo/core-realtime` provides `IRealtimeBroadcaster` (server → client), `IRealtimeHandlerRegistry` (client → server), and the `SocketIORealtimeServer` adapter. `apps/web-next/server.ts` replaces `next start`/`next dev` with a custom Node http server hosting both Next.js and Socket.IO on port 3000. @@ -421,6 +446,7 @@ See `docs/guides/realtime.md` and `docs/decisions/adr-016-realtime-layer.md`. Substrate: **OpenTelemetry SDK** (ADR-017). Sentry is wired as the exporter via `@sentry/opentelemetry`. Vendor swaps are exporter swaps — feature code never touches Sentry or OTel SDK directly. **Symbols (in `core-shared/instrumentation/symbols.ts`):** + - `INSTRUMENTATION_SYMBOLS.ITracer` — bound to `ITracer` (`NoopTracer` / `OtelTracer`) - `INSTRUMENTATION_SYMBOLS.ILogger` — bound to `ILogger` (`NoopLogger` / `OtelLogger`) - `INSTRUMENTATION_SYMBOLS.IMetrics` — bound to `IMetrics` (`NoopMetrics` / `OtelMetrics`) @@ -483,12 +509,12 @@ const wrappedCtrl = withSpan( **Capture rules** (each error captured exactly once via the `__sentryReported` flag from `core-shared/instrumentation/reported-flag.ts`): -| Layer | Captures | Doesn't capture | -|---|---|---| -| Repository | Infra/Payload errors that originate here (inline in catch) | Bubbled errors | -| Use case | Business-rule violations + output-schema failures originated in this body (via `withCapture`) | Errors from repos — flag set, `withCapture` bails | -| Controller | `InputParseError` from `safeParse` failure (via `withCapture`) | Errors from use cases — flag set, `withCapture` bails | -| `defineErrorMiddleware` | Nothing — maps domain → TRPCError only | — | +| Layer | Captures | Doesn't capture | +| ----------------------- | --------------------------------------------------------------------------------------------- | ----------------------------------------------------- | +| Repository | Infra/Payload errors that originate here (inline in catch) | Bubbled errors | +| Use case | Business-rule violations + output-schema failures originated in this body (via `withCapture`) | Errors from repos — flag set, `withCapture` bails | +| Controller | `InputParseError` from `safeParse` failure (via `withCapture`) | Errors from use cases — flag set, `withCapture` bails | +| `defineErrorMiddleware` | Nothing — maps domain → TRPCError only | — | **Boundary rules (eslint-enforced, R40 + R52):** Feature packages MUST NOT `import "@sentry/*"` or `import "@opentelemetry/sdk-*"`. Allowlists: @@ -499,6 +525,7 @@ Feature packages MUST NOT `import "@sentry/*"` or `import "@opentelemetry/sdk-*" The vendor-neutral API packages (`@opentelemetry/api`, `@opentelemetry/api-logs`) are unrestricted within `core-shared/instrumentation/`. **Test rules:** + - Default to `NoopTracer` / `NoopLogger` / `NoopMetrics` (constructor defaults) - Assert spans/captures by injecting `RecordingTracer` / `RecordingLogger` / `RecordingMetrics` from `@repo/core-testing/instrumentation` - Real Sentry SDK + OTel SDK MUST NOT initialize during tests (guarded by `core-testing/setup/no-instrumentation.ts`; old alias `no-sentry` kept for one release) @@ -518,6 +545,7 @@ The vendor-neutral API packages (`@opentelemetry/api`, `@opentelemetry/api-logs` - **TDD Workflow** — `docs/guides/tdd-workflow.md` — red-green-refactor cycle, mocking decision tree, coverage targets Per-package documentation lives in each `AGENTS.md`: + - `packages/core-shared/AGENTS.md` - `packages/core-api/AGENTS.md`, `core-cms/AGENTS.md`, `core-trpc/AGENTS.md` - `packages/core-ui/AGENTS.md` (optional — generated by `pnpm turbo gen core-package ui`; see `turbo/generators/templates/core-package/ui/AGENTS.md.hbs`) diff --git a/README.md b/README.md index 8784754..6e163ea 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,11 @@ Turborepo + pnpm monorepo organised by vertical features, with an **agent-first workflow** and **five conformance gates**. +This template is built for **agent-driven development**. [Sandcastle](https://github.com/mattpocock/sandcastle) is the orchestration substrate; `pnpm work dispatch` is the entry point. See [ADR-019](./docs/decisions/adr-019-sandcastle-for-agent-orchestration.md) for the decision rationale and [`docs/guides/runbook.md`](./docs/guides/runbook.md) for end-to-end usage. + ## Start here -Read [`docs/guides/runbook.md`](./docs/guides/runbook.md) — day-1 onboarding (prerequisites, env vars, daily commands, troubleshooting). +Read [`docs/guides/runbook.md`](./docs/guides/runbook.md) — day-1 onboarding (prerequisites, env vars, daily commands, troubleshooting, **Using Sandcastle for agent dispatch**). ## Quick reference diff --git a/docs/decisions/adr-019-sandcastle-for-agent-orchestration.md b/docs/decisions/adr-019-sandcastle-for-agent-orchestration.md new file mode 100644 index 0000000..0c2f4a3 --- /dev/null +++ b/docs/decisions/adr-019-sandcastle-for-agent-orchestration.md @@ -0,0 +1,125 @@ +# ADR-019 — Sandcastle for Agent Orchestration + +**Status:** Accepted +**Date:** 2026-05-13 +**Spec:** docs/architecture/agent-first-workflow-and-conformance.md +**Companion guide:** docs/guides/runbook.md ("Using Sandcastle for agent dispatch") +**Related:** ADR-011 (TDD foundation), ADR-012 (Lazar conformance), ADR-015 (events and jobs) + +## Context + +This template is designed for **agent-driven feature development**. The conformance +system (ADR-012 + the post-ADR conformance-system-v1 epic) gives agents a tight, +layered feedback loop — type errors in 0s, lint in <1s, boot assertion in ~3s, CI +gates in ~120s. The remaining substrate question is: how does an agent actually +get dispatched against a task? + +Three pieces are needed: + +1. **A way to invoke an agent** (Claude / Codex) with a task description, + inside a sandbox so the agent can't break the host while iterating. +2. **A way to capture the agent's commits** so a reviewer agent can inspect + the diff and approve or reject. +3. **A way to compose the above into a per-task dispatch loop** with retry + semantics, branch management, and integration into the existing + docs/work/ task system. + +Without a substrate that handles all three, agentic development falls back to +copy-paste-prompt-by-hand, which is slow and error-prone. + +## Decision + +Adopt [Sandcastle](https://github.com/mattpocock/sandcastle) (`@ai-hero/sandcastle`) +as the agent-orchestration substrate. `pnpm work dispatch` is the entry point. + +Concretely: + +1. **`@ai-hero/sandcastle` is a workspace-root devDependency.** Pinned at + `^2.73.0` at adoption; pnpm resolves later patches automatically. +2. **`.sandcastle/` holds the canonical prompt templates.** Five role-specific + prompts: PRD eliciter, ADR eliciter, decomposer, implementer, reviewer. + Each enforces the **generator-first** rule (prefer `pnpm turbo gen ` + over hand-rolling — see saved memory `generator-first-for-agents`). +3. **`.sandcastle/Dockerfile`** is the sandbox baseline (node:22-bookworm-slim + - pnpm via corepack). The agent runs `pnpm install --frozen-lockfile` as + its first step per the implementer prompt. +4. **`scripts/work/dispatch.mjs` is the orchestrator.** It reads `_state.json`, + finds the first ready story's first unchecked AC bullet, builds a task spec, + and calls `sandcastle.run({ promptFile, promptArgs: { TASK_FILE_CONTENT } })` + for the implementer, then again for the reviewer with `{{DIFF}}`. The + orchestrator does NOT mutate state in v1 — it prints suggested mutations + for the human to apply. +5. **Two modes:** `pnpm work dispatch` (planning, no agent invoked) and + `pnpm work dispatch --execute` (real sandcastle call, requires + `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`). +6. **Reviewer agent verifies generator-first.** Hand-rolled output that should + have been a `pnpm turbo gen ` invocation is grounds for rejection. +7. **Bring-your-own-key for cost control.** No bundled API key. Agents only + dispatch when the operator explicitly provides credentials. +8. **Per-task max-attempts honoured (v2).** Each task's frontmatter may carry + `max-attempts: N` to bound the implementer↔reviewer retry loop. Default 3. + +## Alternatives considered + +- **Bare Claude Code / Codex CLI invocation per task** — rejected. No sandbox + isolation; no consistent prompt template surface; no built-in branch + management; no reviewer-loop primitive. +- **GitHub Copilot Workspace / native CI agent** — rejected. Vendor lock-in; + workflow lives outside the repo; no local equivalent for development time. +- **Custom orchestrator built from scratch on the Anthropic SDK** — rejected. + Sandcastle already solves sandbox + branch + structured-output extraction; + rebuilding it is not the leverage point. +- **No orchestrator — humans dispatch each task manually via copy-paste** — + rejected as the steady-state mode, but supported as a fallback via planning + mode (`pnpm work dispatch` without `--execute`). +- **A different sandbox provider (Vercel sandboxes, Daytona, native fly.io)** + — sandcastle is provider-agnostic; the choice of provider sits behind the + `SANDCASTLE_PROVIDER` env var and can change without disrupting prompts or + orchestrator code. Default is Docker. + +## Consequences + +### Positive + +- **Per-task isolation.** Each implementer dispatch runs in its own Docker + sandbox + sandbox branch. Bad agent output stays in the branch; merge to + `main` is gated by the reviewer agent + the full 5-gate stack. +- **Provider-agnostic.** Switching from Claude to Codex (or to a future + agent runtime) is a one-line change to the prompt's `agent` parameter. +- **Composable with existing workflow.** `pnpm work` CLI already reads + `_state.json` and the docs/work/ markdown; dispatch is one more subcommand + layered on top. +- **Cost-aware default.** Planning mode invokes no agent; only `--execute` + spends tokens. Operators choose when to escalate from plan to execute. +- **Recoverable failure modes.** If an implementer goes off-rails, its diff + lives on a sandbox branch — review, reject, re-dispatch with notes. + +### Negative / accepted trade-offs + +- **External dependency on sandcastle.** If the project stalls, we either pin + - maintain a fork or migrate to another orchestrator. Sandcastle is small + enough (~3KLOC) that a fork is manageable. +- **Token cost is real.** A complex task can use 100K-200K tokens per + implementer + reviewer round-trip. Operators budget per-dispatch; the + planning mode + the optional `max-attempts` frontmatter cap exposure. +- **Docker dependency for the default sandbox.** Without Docker (or a + provider swap), `--execute` won't run. Documented in the runbook. +- **State mutation is manual in v1.** The orchestrator prints suggested + state mutations; a human ticks the AC bullet + commits. Auto-mutation is + v2 work, gated on confidence that the reviewer's decision can be trusted + without human inspection. + +### Follow-up work + +- **Auto state mutation** — when the reviewer agent's decision is approve, + the orchestrator could automatically tick the AC bullet + commit. Currently + manual; promote when reviewer confidence is established empirically. +- **Multi-task batch dispatch** — `pnpm work dispatch --all-ready` would + fan out across all ready stories. Requires DAG-aware concurrency + (no two implementers touching the same files). +- **Sandcastle CI image alignment** — the `.sandcastle/Dockerfile` is + minimal; once we identify the CI base image, the sandbox should extend it + to match the CI environment exactly. +- **Cost telemetry** — `sandcastle.run()` returns iteration usage stats; the + orchestrator could log these to `_state.json` per-task so operators see + cumulative spend.