Commit Graph

8 Commits

Author SHA1 Message Date
9e7723f9a5 fix(scripts): remove broken session-resume from dispatch loop
Some checks failed
CI / typecheck + lint + boundaries + test + build (push) Has been cancelled
CI / Playwright e2e (push) Has been cancelled
CI / Storybook smoke tests + visual regression (push) Has been cancelled
Coverage snapshot / snapshot (push) Has been cancelled
Release Please / release-please (push) Has been cancelled
Sentry PII guard (R31) / pii-guard (push) Has been cancelled
Sandcastle rejects `resumeSession` when `maxIterations > 1` with
"Resume applies to iteration 1 only; multi-iteration resume
semantics are not supported." Since a TDD slice needs the full
30-iteration budget, the session-resume path we shipped in d5c0120
is dead infrastructure that breaks dispatch mid-run.

Rip it out cleanly:
- runOneSlice drops the resumeSession param + the
  context-exhaustion safety net + sessionId/usage return fields
- executeDispatch drops the currentStory/currentSession bookkeeping
  and the token-reset threshold
- helpers totalInputTokens + isContextExhaustedError go (only used
  by the resume path)
- SANDCASTLE_SESSION_TOKEN_RESET removed from .env.example

Net: -153 lines. Each slice is again an independent sandcastle
session; token cost per slice goes up (each implementer
re-discovers context) but the multi-iteration TDD shape works.
A different cross-slice context-passing mechanism (e.g. a
story-level context summary injected into each task spec) is left
as future work.
2026-05-14 11:48:32 +02:00
d5c01209ea feat(work): resume implementer session across same-story slices
Wires sandcastle's native `resumeSession` into the dispatch loop so
the implementer walks into task N already knowing what task N-1
discovered — repo layout, helper signatures, gate output, prior diff.
No scratchpad / no hand-curated context file; the agent's own Claude
Code conversation log is the carrier.

Three guardrails keep it bounded:

- Story boundary reset. `currentSession` is dropped whenever
  findNextTask returns a different story id. New domain ≈ new
  context — keeps story 03 from inheriting story 02's residue.
- Token-threshold reset. After each approved slice, sum the
  implementer's last-iteration usage (inputTokens +
  cacheCreationInputTokens + cacheReadInputTokens — caching saves
  dollars but doesn't free window space). If above
  SANDCASTLE_SESSION_TOKEN_RESET (default 140000 ≈ 70% of Sonnet
  4.6's 200k), drop the session before the next task. Configurable
  via env.
- Context-exhausted safety net. If the model rejects with
  "prompt is too long" / "context_length_exceeded" / similar, the
  retry loop drops the session and re-runs the attempt fresh
  exactly once. Doesn't count against SANDCASTLE_MAX_ATTEMPTS
  (different failure mode).

Reviewer always runs fresh — each approve/reject decision should be
independent of prior tasks to keep the gate honest. Within a single
slice's reject-fixup retries, the implementer also carries forward
across attempts (so attempt 2 sees attempt 1's reasoning + the
reviewer notes), but that's per-slice cumulative, not cross-slice.

runOneSlice now returns { sessionId, usage } so executeDispatch can
make the carry-or-reset decision per slice.
2026-05-13 20:13:30 +02:00
edbc6a8fad feat(work): dispatch loops + auto-ticks state on approve
Previously the orchestrator ran exactly one implementer + reviewer pair,
printed "(Automatic state mutation by the orchestrator is v2.)", and
exited — the human had to tick the bullet, flip story status, rebuild
state, and re-invoke for every slice. V2 closes the loop:

- Parses the JSON the implementer + reviewer prompts ask the agents to
  emit (`parseAgentJson` — tolerates both ```json fenced and bare
  trailing { ... } shapes). The reviewer's `decision` and the
  implementer's `status` are the orchestrator's discriminators.
- On approve: ticks the bullet in `_story.md` and writes it back. If
  the story now has zero unchecked bullets, flips its frontmatter
  `status: in-progress → done`; if all sibling stories are also done,
  flips the epic's frontmatter the same way. Commits the mutation on
  the host as a separate `chore(work): tick/finish ...` commit so the
  implementer's slice commit stays clean. `_state.json` regenerates
  via the existing pre-commit `rebuild-state` hook.
- On reject: re-dispatches the implementer with the reviewer's notes
  appended to TASK_FILE_CONTENT, bounded by SANDCASTLE_MAX_ATTEMPTS
  (default 3). On the (max+1)th reject the loop exits 1 with the last
  notes printed.
- After every approved slice, calls findNextTask again and dispatches
  the next ready bullet — including across story boundaries (the
  state-builder treats any non-done story with satisfied deps as
  ready, so flipping story 01 to done unblocks story 02 automatically).
- Flags: `--once` (legacy single-slice behavior) and `--max-tasks N`
  bound the loop. Default is unlimited — matches the
  continuous-execution preference.

Auth/sandbox setup is now pulled out of the per-iteration path so the
loop reuses one sandbox across slices.
2026-05-13 19:43:11 +02:00
d6bf2f638f docs(env): document sandcastle iteration env vars in .env.example
The three SANDCASTLE_*_ITERATIONS overrides landed inline in
decompose.mjs and dispatch.mjs (commit 26aa97f) but weren't
surfaced in .env.example. Adds them with the same tuning guidance
the inline comments carry, so users discover the knobs from the
canonical env reference instead of having to grep the code.
2026-05-13 18:57:50 +02:00
e734a9e7a1 docs: subscription auth is the primary sandcastle flow, API key is fallback 2026-05-13 09:31:33 +02:00
88edde342e docs: README + CLAUDE.md + .env.example + quickref reflect latest gates 2026-05-13 09:05:10 +02:00
677a45b52f fix: use port 5433 for Docker PostgreSQL to avoid conflicts with local instance 2026-04-06 15:37:49 +02:00
6cff55d6d3 feat: scaffold root workspace files (Turborepo + pnpm) 2026-04-06 14:04:41 +02:00