Files
agentic-dev-template/docs/guides/runbook.md
Danijel Martinek 7737358509 fix(sandcastle): make Dockerfile match sandcastle's expected shape + document macOS keychain quirk
Two separate sandbox blockers surfaced when the user tried
`pnpm work decompose --execute`:

1. **Container died on exec** — our Dockerfile had:
     - WORKDIR /workspace + CMD ["bash"]
     - No `agent` user (sandcastle exec's as UID:GID it built with)
     - node:22-bookworm-slim (missing some build deps the install
       script wants)
   Sandcastle expects:
     - A non-root `agent` user with home at /home/agent (sandcastle
       does `git config --global --add safe.directory /home/agent/workspace`,
       which fails if the user doesn't exist or the container exited)
     - ENTRYPOINT ["sleep", "infinity"] so the container survives
       the gap between sandcastle creating it and exec'ing in
   Replaced .sandcastle/Dockerfile with the shape `sandcastle init`
   would generate (verified against
   node_modules/@ai-hero/sandcastle/dist/InitService.js):
     - node:22-bookworm (full, not slim) for build tooling
     - apt-get installs git + curl + jq
     - corepack-pinned pnpm@9
     - ARG AGENT_UID=1000 + AGENT_GID=1000; sandcastle's
       build-image passes the host's UID/GID by default
     - `groupmod -o -g $AGENT_GID node` + `usermod -o ... node` —
       the `-o` (non-unique) flag is required because macOS hosts
       have UID:501 GID:20, and GID 20 collides with Debian's
       `dialout` group in the base image (without -o, groupmod
       fails with "GID '20' already exists")
     - USER ${AGENT_UID}:${AGENT_GID}, then install Claude Code CLI
       via the official installer
     - ENV PATH includes /home/agent/.local/bin
     - WORKDIR /home/agent (sandcastle overrides per-run anyway)
     - ENTRYPOINT ["sleep", "infinity"] keeps the container alive

2. **"Not logged in · Please run /login"** inside the container —
   Claude Code on macOS stores credentials in the Keychain, NOT in
   ~/.claude/.credentials.json. Sandcastle's bind-mount of ~/.claude
   finds nothing usable. Documented the workaround:
     - README.md "Sandcastle setup (one-time)" — macOS-specific
       block with the `security find-generic-password ... > ~/.claude/.credentials.json`
       one-liner + chmod 600 + the security trade-off (plaintext
       file vs keychain isolation)
     - docs/guides/runbook.md "Using Sandcastle → Prerequisites" —
       step 3 (Authentication) gets a "macOS quirk" subsection with
       the same extraction one-liner + the API-key fallback as the
       alternative path
     - scripts/work/{dispatch,decompose}.mjs — when the sandcastle
       error matches /Not logged in|Please run \/login/ AND we're on
       darwin, the dispatcher prints the keychain-extraction
       commands + the API-key fallback inline above the generic
       "See runbook" line, so future agents discover the fix at the
       failure site

The image rebuilds clean (`pnpm exec sandcastle docker
build-image`) at ~1.95GB and the container survives sandcastle's
exec — confirmed by reaching the "Not logged in" stage (which is
the next-layer issue, not the Dockerfile issue).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:02:34 +02:00

26 KiB
Raw Blame History

Developer Runbook

You just cloned this repo. This is the only doc you need to read end-to-end. Everything else is reference.


Prerequisites

Tool Version Why
Node.js 22+ Runtime for all apps + scripts
pnpm 9+ Package manager (workspace-aware)
Docker 24+ Local Postgres + sandcastle sandboxes
Git 2.40+ Version control + worktrees

Recommended editor: VS Code or Cursor with the official TypeScript, ESLint, and Prettier extensions.


First-time setup

# 1. Clone + install
git clone <repo-url> template-vertical
cd template-vertical
pnpm install

# 2. Start Postgres (background)
docker compose up -d

# 3. Copy env template and fill in secrets
cp .env.example .env
# Edit .env (see "Environment variables" section below for what each one does)

# 4. Verify the gate stack is green
pnpm typecheck
pnpm test
pnpm lint
pnpm conformance
pnpm fallow
pnpm turbo boundaries

All six should exit 0. If any fails on a fresh clone, file an issue — the main branch is supposed to stay green.

# 5. Start the dev servers
pnpm dev

This runs Next.js (3000), Payload CMS (3001), TanStack Start (3002), and Storybook (6006) in parallel. The bindAll() dispatcher in each app picks the dev-seed binders by default (mock repositories, no Payload connection needed beyond Postgres).


Daily commands

# Development
pnpm dev                          # all dev servers
pnpm dev --filter @repo/web-next  # one app

# Tests
pnpm test                         # everything
pnpm test --filter @repo/auth     # one package
pnpm test:e2e                     # Playwright e2e
pnpm test:stories                 # Storybook smoke tests
pnpm test:visual                  # visual regression (Playwright screenshots)

# Linting + type checking
pnpm typecheck                    # tsc across all packages
pnpm lint                         # ESLint across all packages
pnpm format                       # Prettier write
pnpm format:check                 # Prettier check (CI mode)

# Conformance gates
pnpm conformance                  # cross-feature event closure
pnpm fallow                       # whole-codebase: dead exports, dupes, complexity
pnpm fallow:audit                 # AI-change audit (run before commits)

# Boundary validation
pnpm turbo boundaries             # workspace dependency graph

# Work system
pnpm work status                  # tree of epics + stories
pnpm work next                    # next ready story
pnpm work ready                   # all ready stories
pnpm work blocked                 # blocked stories + what they wait on
pnpm work rebuild-state           # regenerate docs/work/_state.json
pnpm work dispatch                # print next dispatch plan
pnpm work dispatch --execute      # invoke sandcastle (subscription or API key — see runbook)

Environment variables

Copy .env.example to .env and fill what you need. NOT every variable is required for pnpm dev — defaults are dev-friendly.

Required for pnpm dev

Var Example Why
DATABASE_URL postgresql://postgres:postgres@localhost:5433/template Postgres connection (docker compose default)
PAYLOAD_SECRET your-secret-here Payload CMS encryption key (any random 32+ char string in dev)

Optional — app URLs (defaults work in dev)

Var Default Why
NEXT_PUBLIC_APP_URL http://localhost:3000 Public-facing web-next URL
CMS_URL http://localhost:3001 Payload CMS URL
USE_DEV_SEED true in dev Force dev-seed binders (mock repos) instead of Payload
NODE_ENV inherited production flips bind dispatcher to real Payload

Optional — Sentry observability (no DSN = no-op tracer/logger)

Var Why
WEB_NEXT_SENTRY_DSN Server-side OTel + Sentry for web-next
NEXT_PUBLIC_WEB_NEXT_SENTRY_DSN Browser Sentry for web-next
CMS_SENTRY_DSN Server-side for Payload CMS
WEB_TANSTACK_SENTRY_DSN Server-side for TanStack Start
VITE_WEB_TANSTACK_SENTRY_DSN Browser-side for TanStack Start
SENTRY_AUTH_TOKEN, SENTRY_ORG, SENTRY_PROJECT_* Source-map upload at build time
SENTRY_TRACES_SAMPLE_RATE OTel trace sample rate (0.1 recommended in dev)
SENTRY_ENVIRONMENT development / staging / production

Optional — Git commit SHA for releases

Var Why
VERCEL_GIT_COMMIT_SHA / VITE_GIT_COMMIT_SHA / NEXT_PUBLIC_VERCEL_GIT_COMMIT_SHA Surfaces commit SHA in Sentry releases + UI footers

Optional — core-audit (only when gen core-package audit is scaffolded)

Var Why
AUDIT_PSEUDONYM_SALT Salt for the audit log's GDPR-erasure pseudonymisation (production only — must be a stable secret)

Optional — sandcastle dispatch (only when running pnpm work dispatch --execute)

Auth is resolved automatically. Subscription (via ~/.claude/) is the primary path; API key is the fallback.

Var Why
ANTHROPIC_API_KEY Claude API key — fallback when no ~/.claude/ present; not needed for subscribers
OPENAI_API_KEY OpenAI/Codex alternative (fallback)
SANDCASTLE_CLAUDE_CREDS_DIR Override host Claude creds path (default: ~/.claude/)
GITHUB_TOKEN GitHub access for PR creation by the orchestrator
SANDCASTLE_PROVIDER docker (default) / podman / vercel

The agent-first workflow

This template enforces a manifest-first, generator-driven, gate-protected workflow.

When adding a new feature

pnpm turbo gen feature <name>

This emits:

  • packages/<name>/src/feature.manifest.ts — the conformance manifest (use cases, audits, publishes, consumes)
  • packages/<name>/src/di/bind-production.ts with assertFeatureConformance(...) at the tail (refuses to boot on drift)
  • Mock repository, factory, seed, entity, use-case, controller, tests — full canonical shape
  • packages/<name>/src/index.ts exports

After scaffolding, the four-step ordering for any new use case:

  1. Manifest entry — declare the use case in feature.manifest.ts
  2. Contracts — export xInputSchema, xOutputSchema, IXUseCase (factory body throws not implemented)
  3. Tests (red) — write the failing test
  4. Implementation (green) — fill the factory body

The five conformance gates catch drift at every step. See docs/guides/conformance-quickref.md for the manifest field reference.

When adding cross-feature primitives

pnpm turbo gen event             # event contract or handler (needs gen core-package events)
pnpm turbo gen job               # background job
pnpm turbo gen realtime          # realtime channel or handler (needs gen core-package realtime)
pnpm turbo gen core-package <x>  # optional core package (events/realtime/trpc/ui/audit)
pnpm turbo gen core-ui-component <x>  # atomic-design component (needs gen core-package ui)

Always prefer generators over hand-rolling. The generators emit the canonical shape; hand-rolled code drifts from generator output and breaks the CI scaffold-drift check.

Tracking work

The repo uses docs/work/ for epic/story/task tracking:

docs/work/
├── README.md
├── _state.json              # derived, regenerated by pre-commit hook
├── prds/                    # PRDs go here
├── _templates/              # markdown templates
└── <epic-slug>/
    ├── _epic.md
    └── <story-slug>/
        └── _story.md        # contains the Tasks checklist

Use pnpm work next to see what's ready. Use pnpm work dispatch to plan the next sandcastle dispatch.


The five conformance gates

Gate Latency What it catches Runs when
TypeScript brands 0s forgotten withSpan / withCapture / withAudit; manifest ↔ binding-slot type mismatch on save (IDE)
ESLint (8 conformance/* rules) <1s manifest ↔ code drift; missing sibling test; missing manifest; atomic-tier import direction on save / pnpm lint
Boot assertion ~3s runtime binding without required brand; manifest edited without rebinder pnpm dev startup
pnpm conformance ~120s orphan event consumers across features CI
pnpm fallow ~3060s dead exports / unused files; duplicate code; circular deps; complexity hotspots; AI-change audit CI

For the full design see docs/architecture/agent-first-workflow-and-conformance.md. For the daily reference see docs/guides/conformance-quickref.md.


Using Sandcastle for agent dispatch

Sandcastle is the substrate that takes a markdown task description, hands it to a Claude / Codex agent running inside an isolated Docker sandbox, captures the agent's commits, and returns them so the orchestrator can route the diff to a reviewer agent. The repo's pnpm work dispatch wraps sandcastle for the manifest-first workflow.

When to use Sandcastle

  • Routine, well-specified tasks — adding a behaviour slice to an existing use case, migrating a feature to a new convention, scaffolding new packages. The task description is the contract; sandcastle automates the rest.
  • Parallel work — dispatch multiple independent tasks at once; each runs in its own sandbox branch.
  • Reviewer-loop verification — the reviewer agent reads the diff against the task spec and either approves or sends feedback for another implementer pass.

When NOT to use Sandcastle

  • Exploratory / design work — when the right answer isn't known, write it yourself. Sandcastle thrives when the task is "implement this", not "figure out what to do".
  • Cross-cutting refactors — dispatch is per-task; many tasks that touch unrelated files at once is better done in one human-driven session.
  • First-time integrations (e.g., adopting a new SDK) — better to walk through it manually, then capture the pattern as a generator for future sandcastle dispatches.

Prerequisites

  1. Docker running — sandcastle uses Docker for the sandbox by default. docker info should succeed.

  2. Sandcastle image built (one-time) — sandcastle dispatches into a tagged Docker image; you build it once per clone:

    pnpm exec sandcastle docker build-image
    # Tags as: sandcastle:template-vertical (derived from root package.json name)
    

    If you see Image 'sandcastle:template-vertical' not found locally. Build it first with 'sandcastle docker build-image' on dispatch, this step was skipped.

    To rebuild after editing .sandcastle/Dockerfile:

    pnpm exec sandcastle docker remove-image
    pnpm exec sandcastle docker build-image
    
  3. Authentication — pick ONE:

    • Recommended: Claude Pro / Max subscription. Run claude login once on the host. Sandcastle's sandbox bind-mounts your ~/.claude/ into the container so the Claude Code CLI inside the sandbox uses your subscription session. Zero per-task token spend for subscribers.

      macOS quirk: Claude Code stores credentials in the macOS Keychain, NOT in ~/.claude/.credentials.json — so the bind-mount finds nothing. If you hit Not logged in · Please run /login inside the sandbox, extract the keychain credentials to a file once:

      security find-generic-password -s "Claude Code-credentials" -a "$USER" -w \
        > ~/.claude/.credentials.json
      chmod 600 ~/.claude/.credentials.json
      

      Trade-off: credentials now live as a plaintext file at the path; the macOS Keychain isolation is replaced by filesystem permissions (chmod 600 + your home dir's mode). When the token expires (~30 days), re-run the same one-liner. Linux + WSL hosts write ~/.claude/.credentials.json directly during claude login, so this step is macOS-only.

    • Alternative: API key. Set ANTHROPIC_API_KEY or OPENAI_API_KEY in your environment. Falls back automatically when ~/.claude/ is absent. Use this if you don't want a plaintext credentials file on disk.

    • Override the creds path via SANDCASTLE_CLAUDE_CREDS_DIR if your Claude Code config lives somewhere non-standard.

  4. GitHub token (optional) — GITHUB_TOKEN if you want the orchestrator to create PRs.

  5. .sandcastle/ config present — already in tree:

    • Dockerfile — node:22 + pnpm + Claude Code CLI; reads creds from ~/.claude/ inside the container
    • prd-eliciter.prompt.md, adr-eliciter.prompt.md, decomposer.prompt.md, implementer.prompt.md, reviewer.prompt.md — the five role prompts

The dispatch flow

pnpm work next           → identifies the next ready story (DAG-aware)
pnpm work dispatch       → prints what WOULD be dispatched (no Sandcastle call)
pnpm work dispatch --execute
                         → invokes sandcastle.run(implementer prompt + task spec)
                         → sandcastle returns { branch, commits, stdout, ... }
                         → orchestrator computes `git diff main..<branch>`
                         → invokes sandcastle.run(reviewer prompt + diff)
                         → reviewer returns approve / reject + notes
                         → orchestrator prints suggested state mutation
                              (in v1: human ticks the bullet + commits manually)

Worked example — dispatch a real task

Suppose pnpm work next reports:

auth-v1 / 02-sign-up — Sign up with email and password
  status: in-progress, tasks: 3/7

The story file docs/work/auth-v1/02-sign-up/_story.md has a Tasks list with the next unchecked bullet:

- [ ] Hash password using injected IPasswordHasher before persisting

Step 1 — Plan

pnpm work dispatch

Output:

=== Dispatch plan ===
  Epic:     auth-v1
  Story:    02-sign-up — Sign up with email and password
  Bullet:   - [ ] Hash password using injected IPasswordHasher before persisting
  Prompt:   .sandcastle/implementer.prompt.md

To execute this dispatch, run:
  ANTHROPIC_API_KEY=... pnpm work dispatch --execute

This is safe to run anywhere — it never invokes Sandcastle.

Step 2 — Execute

# Subscription mode (recommended):
claude login                         # one-time, host
pnpm work dispatch --execute         # uses ~/.claude/

# API-key mode (fallback):
ANTHROPIC_API_KEY=sk-ant-... pnpm work dispatch --execute

The orchestrator:

  1. Builds the task spec (story metadata + the current bullet + full story context)
  2. Calls sandcastle.run({ promptFile: ".sandcastle/implementer.prompt.md", promptArgs: { TASK_FILE_CONTENT: spec }, ... })
  3. Sandcastle pulls the Docker image, mounts the repo into /workspace, runs claudeCode with the implementer prompt template populated
  4. The implementer agent (inside the sandbox):
    • Reads the task spec
    • Runs pnpm install --frozen-lockfile
    • Locates the use case: packages/auth/src/application/use-cases/sign-up.use-case.ts
    • Writes a red test asserting hasher.hash is called before repo.create
    • Runs pnpm test --filter @repo/auth — sees the red test fail
    • Adds IPasswordHasher to the factory deps; calls hasher.hash(input.password) before repo.create
    • Runs pnpm test --filter @repo/auth — green
    • Runs pnpm typecheck, pnpm lint, pnpm conformance, pnpm fallow:audit — all five gates green
    • Commits on a sandbox branch (task/02-sign-up-hash-password or similar)
  5. Sandcastle returns: { branch: "task/02-sign-up-hash-password", commits: [{sha: "..."}], stdout: "...", ... }

Step 3 — Review

The orchestrator immediately runs the reviewer:

  1. Computes git diff main..task/02-sign-up-hash-password
  2. Calls sandcastle.run({ promptFile: ".sandcastle/reviewer.prompt.md", promptArgs: { TASK_FILE_CONTENT: spec, DIFF: diff }, ... })
  3. The reviewer agent reads the diff + task + story; verifies:
    • The AC bullet is satisfied (test was added; impl calls hasher.hash)
    • Nothing in the "Out of scope" section was touched (no drive-by edits)
    • All gates were run
    • The implementer ran pnpm fallow:audit
    • Generator-first was respected (no hand-rolled scaffolding)
  4. Returns { decision: "approve", ac_verified: [4], scope_violations: [], notes: "..." }

Step 4 — State mutation (v1: manual)

The orchestrator prints:

=== Suggested state mutation ===
  Edit docs/work/auth-v1/02-sign-up/_story.md — tick the bullet:
    - [x] Hash password using injected IPasswordHasher before persisting
  Then: pnpm work rebuild-state && git add -A && git commit -m "..."

(Automatic state mutation by the orchestrator is v2.)

You (the human) then:

  1. Merge the sandbox branch: git merge --no-ff task/02-sign-up-hash-password
  2. Tick the bullet in the story markdown
  3. The pre-commit hook auto-runs pnpm work rebuild-state + re-stages _state.json
  4. Push. CI runs the full gate stack (typecheck + test + lint + conformance + fallow + boundaries + visual regression).

Troubleshooting Sandcastle

✗ --execute requires either: 1. Claude Code logged in on host ... 2. ANTHROPIC_API_KEY ... — No auth resolved. Run claude login to enable subscription mode (recommended), OR set ANTHROPIC_API_KEY (fallback). Override the host creds path via SANDCASTLE_CLAUDE_CREDS_DIR.

Error: Cannot find module '@ai-hero/sandcastle' — Run pnpm install. Sandcastle is a dev dependency at the workspace root.

Error: docker: command not found or sandcastle hangs at "starting sandbox" — Docker isn't running. docker info to confirm. On macOS, start Docker Desktop.

The implementer agent times out — Default idleTimeoutSeconds is 600 (10 minutes). For complex tasks, increase via dispatch.mjs (look for the run({...}) call and add idleTimeoutSeconds: 1800).

The reviewer rejects with generator_skipped: true — The implementer hand-rolled what should have been generator output. Either re-dispatch (it gets the reviewer notes), or delete the implementer's diff and run pnpm turbo gen <kind> manually first, then dispatch the customisation as a separate task.

The reviewer rejects with scope_violations: [...] — The implementer touched files outside the AC. Re-dispatch with stricter scope; the rejection notes are passed back as context.

Cost control — each dispatch typically uses 50K200K agent tokens depending on task complexity. The orchestrator does NOT cap retries; if you want to limit, set max-attempts: 1 in the task's frontmatter (the orchestrator respects this in v2 — for now, just don't re-run dispatch after a reject).

Sandbox boots but Claude Code inside it says "Not authenticated" / "API key required" — The host ~/.claude/ mount didn't make it into the sandbox, OR your local Claude Code session expired. On the host, run claude once to confirm your session is live, then re-dispatch. If you're on Linux + SELinux, the mount may have been blocked — check the sandcastle output for SELinux warnings; set selinuxLabel: "z" or false in dispatch.mjs's docker opts if needed.

Cost-aware variant: planning-only loop

If you want sandcastle's structure without the agent spend, use planning mode + manual execution:

pnpm work dispatch              # prints the plan
# (you implement the bullet manually in your editor)
# tick the bullet in docs/work/.../...story.md
# commit; pre-commit auto-rebuilds _state.json
pnpm work dispatch              # prints the NEXT plan

This gives you the same DAG-aware "what's next?" without invoking any agent. Useful for exploratory work or low-budget contexts.


Troubleshooting

pnpm dev refuses to boot with ConformanceError — A feature's binding lost a required brand. The error message tells you which use case + which brand. Re-bind through withSpan / withCapture / withAudit as needed.

pnpm lint errors with conformance/feature-must-have-manifest — You created a feature with use cases but no feature.manifest.ts. Run pnpm turbo gen feature <name> to scaffold the canonical shape, or hand-write the manifest at packages/<feature>/src/feature.manifest.ts.

pnpm conformance says "orphan consumer" — A feature declares consumes: ["X"] but no feature publishes X. Either add the publish to the producing feature's manifest + factory, or remove the consumer.

pnpm fallow reports new dead exports or dupes — Your change added unused exports or duplicated logic. Either remove the dead code or accept with pnpm fallow:audit --gate all (the audit considers the baseline; only NEW findings fail).

Pre-commit hook refuses to commit with "state-sync-guard" — You staged docs/work/_state.json but it's not byte-identical to pnpm work rebuild-state output. Run pnpm work rebuild-state && git add docs/work/_state.json and try again.

Tests fail in @repo/turbo-generators with Vitest worker timeouts — Known flaky on slow machines. Re-run; if persistent, increase the turbo-generators package's vitest testTimeout.

pnpm work dispatch --execute errors with "requires either: 1. Claude Code logged in..." — No auth source found. Run claude login (subscription mode, recommended), or set ANTHROPIC_API_KEY (fallback). Run pnpm work dispatch (no flag) to just print the plan without auth.


Once you've got pnpm dev running:

  1. AGENTS.md — package map, boundary rules, per-package conventions
  2. CLAUDE.md — full convention reference (manifest-first ordering, factory patterns, instrumentation rules)
  3. docs/guides/conformance-quickref.md — daily manifest + gates reference
  4. docs/guides/tdd-workflow.md — red-green-refactor with the gate stack
  5. docs/guides/scaffolding-a-feature.mdpnpm turbo gen feature reference
  6. docs/guides/adding-a-feature.md — end-to-end walkthrough
  7. docs/architecture/agent-first-workflow-and-conformance.md — the full design
  8. docs/architecture/feature-conformance-explainer.html — interactive explainer (open in browser)

For deeper topics:

  • docs/guides/events-and-jobs.md — cross-feature events (requires gen core-package events)
  • docs/guides/realtime.md — Socket.IO channels (requires gen core-package realtime)
  • docs/guides/audit-and-compliance.md — DPA-compliant audit logging (requires gen core-package audit)
  • docs/guides/frontend-work-shape.md — atomic design + Storybook conventions
  • docs/guides/infrastructure-work-shape.md — ADR-first flow for new infrastructure

Common pitfalls

  • Skipping the generator. Always run pnpm turbo gen <kind> before hand-rolling. Generators emit the canonical shape; the CI scaffold-drift check will fail on hand-rolled features.
  • Forgetting pnpm work rebuild-state after editing docs/work/ markdown. The pre-commit hook handles this automatically when you stage markdown; only matters if you push without committing.
  • Bypassing --no-verify on commits. The pre-commit hook catches drift early. If it's blocking a legitimate change, fix the underlying issue, not the hook.
  • Hand-editing _state.json. Don't. The state-sync-guard refuses commits that drift from rebuild output. Edit the markdown; let the rebuild script propagate.
  • Committing .env. It's gitignored. Use .env.example for new vars.

For deeper philosophy: this template is built around the assumption that AI agents will author most feature work. The conformance system is designed as an agent feedback loop. Latency-layered gates compound: 0s + <1s + 3s + 120s + 60s. The faster the inner loop, the more iterations agents can make per task.

If you're a human contributor, the same workflow applies — the gates aren't punitive, they're navigational aids.