Commit Graph

20 Commits

Author SHA1 Message Date
9e7723f9a5 fix(scripts): remove broken session-resume from dispatch loop
Some checks failed
CI / typecheck + lint + boundaries + test + build (push) Has been cancelled
CI / Playwright e2e (push) Has been cancelled
CI / Storybook smoke tests + visual regression (push) Has been cancelled
Coverage snapshot / snapshot (push) Has been cancelled
Release Please / release-please (push) Has been cancelled
Sentry PII guard (R31) / pii-guard (push) Has been cancelled
Sandcastle rejects `resumeSession` when `maxIterations > 1` with
"Resume applies to iteration 1 only; multi-iteration resume
semantics are not supported." Since a TDD slice needs the full
30-iteration budget, the session-resume path we shipped in d5c0120
is dead infrastructure that breaks dispatch mid-run.

Rip it out cleanly:
- runOneSlice drops the resumeSession param + the
  context-exhaustion safety net + sessionId/usage return fields
- executeDispatch drops the currentStory/currentSession bookkeeping
  and the token-reset threshold
- helpers totalInputTokens + isContextExhaustedError go (only used
  by the resume path)
- SANDCASTLE_SESSION_TOKEN_RESET removed from .env.example

Net: -153 lines. Each slice is again an independent sandcastle
session; token cost per slice goes up (each implementer
re-discovers context) but the multi-iteration TDD shape works.
A different cross-slice context-passing mechanism (e.g. a
story-level context summary injected into each task spec) is left
as future work.
2026-05-14 11:48:32 +02:00
d5c01209ea feat(work): resume implementer session across same-story slices
Wires sandcastle's native `resumeSession` into the dispatch loop so
the implementer walks into task N already knowing what task N-1
discovered — repo layout, helper signatures, gate output, prior diff.
No scratchpad / no hand-curated context file; the agent's own Claude
Code conversation log is the carrier.

Three guardrails keep it bounded:

- Story boundary reset. `currentSession` is dropped whenever
  findNextTask returns a different story id. New domain ≈ new
  context — keeps story 03 from inheriting story 02's residue.
- Token-threshold reset. After each approved slice, sum the
  implementer's last-iteration usage (inputTokens +
  cacheCreationInputTokens + cacheReadInputTokens — caching saves
  dollars but doesn't free window space). If above
  SANDCASTLE_SESSION_TOKEN_RESET (default 140000 ≈ 70% of Sonnet
  4.6's 200k), drop the session before the next task. Configurable
  via env.
- Context-exhausted safety net. If the model rejects with
  "prompt is too long" / "context_length_exceeded" / similar, the
  retry loop drops the session and re-runs the attempt fresh
  exactly once. Doesn't count against SANDCASTLE_MAX_ATTEMPTS
  (different failure mode).

Reviewer always runs fresh — each approve/reject decision should be
independent of prior tasks to keep the gate honest. Within a single
slice's reject-fixup retries, the implementer also carries forward
across attempts (so attempt 2 sees attempt 1's reasoning + the
reviewer notes), but that's per-slice cumulative, not cross-slice.

runOneSlice now returns { sessionId, usage } so executeDispatch can
make the carry-or-reset decision per slice.
2026-05-13 20:13:30 +02:00
edbc6a8fad feat(work): dispatch loops + auto-ticks state on approve
Previously the orchestrator ran exactly one implementer + reviewer pair,
printed "(Automatic state mutation by the orchestrator is v2.)", and
exited — the human had to tick the bullet, flip story status, rebuild
state, and re-invoke for every slice. V2 closes the loop:

- Parses the JSON the implementer + reviewer prompts ask the agents to
  emit (`parseAgentJson` — tolerates both ```json fenced and bare
  trailing { ... } shapes). The reviewer's `decision` and the
  implementer's `status` are the orchestrator's discriminators.
- On approve: ticks the bullet in `_story.md` and writes it back. If
  the story now has zero unchecked bullets, flips its frontmatter
  `status: in-progress → done`; if all sibling stories are also done,
  flips the epic's frontmatter the same way. Commits the mutation on
  the host as a separate `chore(work): tick/finish ...` commit so the
  implementer's slice commit stays clean. `_state.json` regenerates
  via the existing pre-commit `rebuild-state` hook.
- On reject: re-dispatches the implementer with the reviewer's notes
  appended to TASK_FILE_CONTENT, bounded by SANDCASTLE_MAX_ATTEMPTS
  (default 3). On the (max+1)th reject the loop exits 1 with the last
  notes printed.
- After every approved slice, calls findNextTask again and dispatches
  the next ready bullet — including across story boundaries (the
  state-builder treats any non-done story with satisfied deps as
  ready, so flipping story 01 to done unblocks story 02 automatically).
- Flags: `--once` (legacy single-slice behavior) and `--max-tasks N`
  bound the loop. Default is unlimited — matches the
  continuous-execution preference.

Auth/sandbox setup is now pulled out of the per-iteration path so the
loop reuses one sandbox across slices.
2026-05-13 19:43:11 +02:00
eadbb7ebd9 fix(work): emit completion signal to stop sandcastle agent loops
Sandcastle re-invokes agents up to maxIterations even when the work is
already done — the decomposer was looping 4x re-writing the same epic
on every dispatch. Two halves to the fix:

- Pass completionSignal: "<promise>COMPLETE</promise>" explicitly on
  all three run() calls (decompose, implementer, reviewer). Makes the
  contract visible alongside maxIterations instead of relying on
  sandcastle's default.
- Append a "Signal completion (required)" section to each prompt
  telling the agent to emit the literal marker as its final line when
  the work is genuinely done, plus a "do NOT emit if..." list to
  discourage premature signaling.
2026-05-13 19:11:44 +02:00
26aa97f0ef fix(work): bump sandcastle maxIterations so agents finish + commit
Sandcastle's default maxIterations: 1 cut every agent off after its
first response, so files written inside the sandbox never made it
into a captured commit. The decomposer wrote 9 epic + story files,
hit the limit, and sandcastle returned 0 commits — the host saw
nothing.

decompose.mjs: maxIterations 10 (small authoring task — read
context, write files, commit). Override via env
SANDCASTLE_DECOMPOSE_ITERATIONS.

dispatch.mjs:
- Implementer: maxIterations 30 (full TDD slice — read context,
  red test, green impl, run all five gates, commit). Override via
  SANDCASTLE_IMPLEMENTER_ITERATIONS.
- Reviewer: maxIterations 10 (read diff + task spec, decide).
  Override via SANDCASTLE_REVIEWER_ITERATIONS.

Each call site documents WHY the value was picked + names the env
override inline so tuning is discoverable from the code.
2026-05-13 18:56:59 +02:00
52b4409d94 docs(work): refresh stale JSDoc headers in cli + prd-ship
Two stale-comment fixes surfaced after the dispatch handoff fix:

cli.mjs: the top-of-file JSDoc listed only 3 of 8 subcommands
(rebuild-state, status, next) and missed ready / blocked /
dispatch / decompose / prd-ship. Rewrote the header to describe
all 8 subcommands + their flags + the explicit-runCli routing
pattern that replaces the older side-effect-on-import approach
(established when the dispatch handoff broke and got fixed in
bb643b8).

prd-ship.mjs: the JSDoc claimed allowed transitions were
"<approved|in-review|draft> -> shipped", but the code refuses
draft (throws "still draft — flip to approved (human review)
before shipping"). Corrected the doc to "<approved|in-review>
-> shipped" + clarified that draft -> approved is the human step
deliberately kept outside the command's scope.

No behaviour change — comments only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:18:42 +02:00
bb643b8635 fix(work): dispatch CLI handoff broke after import-side-effect guard
cli.mjs's `dispatch` branch called `import("./dispatch.mjs")` and
relied on dispatch.mjs's top-level CLI block running as a side
effect of the import. The earlier guard added to dispatch.mjs (to
stop the CLI firing when sibling work scripts import
`resolveClaudeAuth`) also stopped this legit handoff — so
`pnpm work dispatch` silently exited with no output.

Fix: explicit CLI entry function, called by name. Same pattern
already in use for prd-ship + decompose.

dispatch.mjs:
  - Wraps the args parsing + print/execute branch in `export async
    function runCli(args)`
  - The invokedDirectly guard now wraps `runCli(process.argv.slice(2))`
    so direct-invocation (`node scripts/work/dispatch.mjs ...`) still
    works

cli.mjs:
  - Imports runCli as runDispatch
  - The `cmd === "dispatch"` branch calls runDispatch(args) directly
    with a .catch attached (instead of import("./dispatch.mjs"))

Verified: `pnpm work dispatch` now correctly prints the dispatch
plan for the first ready task (`binder-wrap-helper /
01-wire-use-case-helper`'s first bullet); decompose tests stay 9/9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:17:47 +02:00
9d4b801909 fix(work): wire inline macOS keychain hint into dispatch + decompose error paths
The dispatch.mjs + decompose.mjs error handlers grew an image-not-
found hint in cd0a332 but the macOS keychain hint that the earlier
commit's message claimed wasn't actually applied (the Edit tool
required re-reading those files post-commit).

This commit applies the keychain hint to both error handlers: when
the sandcastle error matches /Not logged in|Please run \/login/ AND
process.platform === "darwin", the dispatcher prints the
`security find-generic-password ... > ~/.claude/.credentials.json`
one-liner + chmod 600 + the API-key fallback inline above the
generic "See runbook" line.

Now future agents hitting this on macOS see the fix at the failure
site, not just in docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:03:13 +02:00
cd0a332443 docs: surface sandcastle image-build step (one-time setup)
Closes the gap the user hit running `pnpm work decompose --execute`:
sandcastle errored with `Image 'sandcastle:template-vertical' not
found locally. Build it first with 'sandcastle docker build-image'`,
but neither the README nor the runbook documented this step.

README.md: new "Sandcastle setup (one-time)" section after Quick
reference. Three commands (docker info, build-image, auth) — the
minimum needed to make dispatch work. Links to the runbook for the
full lifecycle.

docs/guides/runbook.md: Prerequisites in "Using Sandcastle" grow
from 4 to 5 items. New step 2 walks through `sandcastle docker
build-image`, quotes the exact "Image not found locally" error so
agents searching for the string land on the fix, and shows the
remove-image + rebuild flow for Dockerfile edits.

.sandcastle/README.md: new "Build the sandbox image (one-time)"
section parallel to the env section, cross-linking to the runbook.

scripts/work/decompose.mjs + scripts/work/dispatch.mjs: when the
sandcastle error message matches the "Image '.+' not found locally"
pattern, the dispatcher now prints the build-image command inline
above the generic "See runbook" line. The error stack itself remains
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:51:30 +02:00
014578c9a8 feat(work): pnpm work decompose subcommand
Closes the gap surfaced by the user: `pnpm work` usage referenced
`decompose` (via docs + the to-prd skill) but the subcommand was
never built. Mirrors `pnpm work dispatch`'s shape.

scripts/work/decompose.mjs (new):
  - validatePrdForDecompose(prdPath) — refuses draft (must go
    through human review first), in-review (review incomplete),
    shipped (epic already exists); accepts only approved
  - printDecomposePlan(prdId, prdPath, frontmatter) — print-mode
    output showing the PRD's eligibility + sandcastle invocation
    plan + auth modes
  - executeDecompose(prdId, prdPath, prdText) — invokes sandcastle
    with .sandcastle/decomposer.prompt.md, passing PRD_FILE_CONTENT
    promptArg. The decomposer agent writes the epic + per-story
    files to disk on a sandcastle branch the human can review
  - runCli(args, { workRoot }) — entry point used by cli.mjs
  - Direct invocation also supported (mirrors dispatch.mjs's
    invokedDirectly guard, NEW pattern after this commit)

scripts/work/decompose.test.mjs (new, 9 tests, all green):
  - validatePrdForDecompose: accepts approved; rejects draft,
    in-review, shipped, unknown status, missing file
  - runCli: writes error + returns 1 on missing PRD; writes error
    + returns 1 on draft PRD; prints plan + returns 0 on approved

scripts/work/cli.mjs:
  - Adds `decompose` subcommand to usage + dispatch
  - Usage formatting realigned for the 3-line subcommand block

scripts/work/dispatch.mjs:
  - **Fix** the bug surfaced by the user: dispatch.mjs's CLI ran
    as a top-level side effect whenever any of its exports was
    imported. decompose.mjs imports resolveClaudeAuth from it, so
    importing decompose.mjs printed "No ready task to dispatch."
    Added an `import.meta.url === \`file://${process.argv[1]}\``
    guard so the CLI only runs when invoked directly. This unblocks
    cross-import without side effects.

Smoke-tested end-to-end:
  - `pnpm work decompose` (no id) prints usage + exits 2
  - `pnpm work decompose 2026-05-13-binder-wrap-helper` prints the
    decompose plan with status: approved (eligible)
  - 9/9 unit tests green
  - dispatch.mjs's existing direct-invocation path unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 17:46:57 +02:00
32d20872e3 feat(work): pnpm work prd-ship + auto-flip integration in sandcastle
Closes the PRD-lifecycle gap surfaced by the user: when sandcastle
finishes an epic's last task, the seed PRD should auto-flip from
approved -> shipped. Builds the mechanism, wires it into the work
CLI + state index + reviewer prompt + docs.

scripts/work/prd-ship.mjs (new):
  - parseFrontmatter / serializeFrontmatter — minimal YAML-ish parser
    sufficient for PRD frontmatter (scalar + list shapes)
  - flipPrdStatus — pure function: takes PRD text, returns new text
    with status=shipped + shipped=<date> + optional shipping-commits.
    Refuses to flip draft, idempotent fail-soft on already-shipped,
    rejects unexpected statuses
  - deriveShippingCommits — best-effort git log of the linked epic
    folder for the --auto-commits flag
  - findPrdPath — id -> path lookup under docs/work/prds/
  - runCli — wiring for `pnpm work prd-ship <id> [--commits|--auto-commits]`

scripts/work/prd-ship.test.mjs (new, 17 tests):
  - Frontmatter parser handles scalars + lists + missing frontmatter
  - flipPrdStatus covers all transitions + refusals + body/key preservation
  - findPrdPath + serializeFrontmatter coverage

scripts/work/state-builder.mjs:
  - Epic entries gain a `prd` field
  - New computeNeedsPrdShip surfaces epics done with PRD status not yet
    shipped: state.needs_prd_ship[] with action commands

scripts/work/cli.mjs:
  - New subcommand `pnpm work prd-ship <id>`

.sandcastle/reviewer.prompt.md:
  - "Epic close-out: PRD status flip" section instructing reviewer to
    check _state.json.needs_prd_ship and run the suggested action
  - JSON output extends with prd_shipped: "<id>" | null

docs/work/README.md:
  - "PRD lifecycle" section documenting the 4 statuses + auto-flip

Future PRDs follow the lifecycle automatically: decomposer refuses
draft, human flips to approved, sandcastle ships the epic, reviewer
runs prd-ship on the final task, PRD lands as shipped with its
commit trail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 16:51:48 +02:00
4e1167e390 test(scripts): resolveClaudeAuth — subscription/api-key/missing modes 2026-05-13 09:30:11 +02:00
936611ba62 feat(scripts): dispatch.mjs — subscription-first auth via ~/.claude mount 2026-05-13 09:28:20 +02:00
d1b00f1cf5 feat(scripts): pnpm work dispatch — wire CLI to dispatch.mjs 2026-05-13 08:19:19 +02:00
da811eb461 feat(scripts): dispatch.mjs — planner + execute-mode skeleton 2026-05-13 08:18:58 +02:00
4cf979aaa5 feat(scripts): pnpm work ready + blocked subcommands, DAG-aware next 2026-05-13 08:05:19 +02:00
23fedac1a8 feat(scripts): state-builder reads depends-on + blocks from frontmatter 2026-05-13 08:04:38 +02:00
1ebffa68a6 feat(scripts): state-sync-guard for pre-commit safety net 2026-05-13 07:54:03 +02:00
be8e89baed feat(scripts): pnpm work CLI — rebuild-state, status, next 2026-05-13 07:46:51 +02:00
6b57d76dc2 feat(scripts): work state-builder — walks docs/work/ tree 2026-05-13 07:46:28 +02:00