Sandcastle rejects `resumeSession` when `maxIterations > 1` with
"Resume applies to iteration 1 only; multi-iteration resume
semantics are not supported." Since a TDD slice needs the full
30-iteration budget, the session-resume path we shipped in d5c0120
is dead infrastructure that breaks dispatch mid-run.
Rip it out cleanly:
- runOneSlice drops the resumeSession param + the
context-exhaustion safety net + sessionId/usage return fields
- executeDispatch drops the currentStory/currentSession bookkeeping
and the token-reset threshold
- helpers totalInputTokens + isContextExhaustedError go (only used
by the resume path)
- SANDCASTLE_SESSION_TOKEN_RESET removed from .env.example
Net: -153 lines. Each slice is again an independent sandcastle
session; token cost per slice goes up (each implementer
re-discovers context) but the multi-iteration TDD shape works.
A different cross-slice context-passing mechanism (e.g. a
story-level context summary injected into each task spec) is left
as future work.
Wires sandcastle's native `resumeSession` into the dispatch loop so
the implementer walks into task N already knowing what task N-1
discovered — repo layout, helper signatures, gate output, prior diff.
No scratchpad / no hand-curated context file; the agent's own Claude
Code conversation log is the carrier.
Three guardrails keep it bounded:
- Story boundary reset. `currentSession` is dropped whenever
findNextTask returns a different story id. New domain ≈ new
context — keeps story 03 from inheriting story 02's residue.
- Token-threshold reset. After each approved slice, sum the
implementer's last-iteration usage (inputTokens +
cacheCreationInputTokens + cacheReadInputTokens — caching saves
dollars but doesn't free window space). If above
SANDCASTLE_SESSION_TOKEN_RESET (default 140000 ≈ 70% of Sonnet
4.6's 200k), drop the session before the next task. Configurable
via env.
- Context-exhausted safety net. If the model rejects with
"prompt is too long" / "context_length_exceeded" / similar, the
retry loop drops the session and re-runs the attempt fresh
exactly once. Doesn't count against SANDCASTLE_MAX_ATTEMPTS
(different failure mode).
Reviewer always runs fresh — each approve/reject decision should be
independent of prior tasks to keep the gate honest. Within a single
slice's reject-fixup retries, the implementer also carries forward
across attempts (so attempt 2 sees attempt 1's reasoning + the
reviewer notes), but that's per-slice cumulative, not cross-slice.
runOneSlice now returns { sessionId, usage } so executeDispatch can
make the carry-or-reset decision per slice.
Previously the orchestrator ran exactly one implementer + reviewer pair,
printed "(Automatic state mutation by the orchestrator is v2.)", and
exited — the human had to tick the bullet, flip story status, rebuild
state, and re-invoke for every slice. V2 closes the loop:
- Parses the JSON the implementer + reviewer prompts ask the agents to
emit (`parseAgentJson` — tolerates both ```json fenced and bare
trailing { ... } shapes). The reviewer's `decision` and the
implementer's `status` are the orchestrator's discriminators.
- On approve: ticks the bullet in `_story.md` and writes it back. If
the story now has zero unchecked bullets, flips its frontmatter
`status: in-progress → done`; if all sibling stories are also done,
flips the epic's frontmatter the same way. Commits the mutation on
the host as a separate `chore(work): tick/finish ...` commit so the
implementer's slice commit stays clean. `_state.json` regenerates
via the existing pre-commit `rebuild-state` hook.
- On reject: re-dispatches the implementer with the reviewer's notes
appended to TASK_FILE_CONTENT, bounded by SANDCASTLE_MAX_ATTEMPTS
(default 3). On the (max+1)th reject the loop exits 1 with the last
notes printed.
- After every approved slice, calls findNextTask again and dispatches
the next ready bullet — including across story boundaries (the
state-builder treats any non-done story with satisfied deps as
ready, so flipping story 01 to done unblocks story 02 automatically).
- Flags: `--once` (legacy single-slice behavior) and `--max-tasks N`
bound the loop. Default is unlimited — matches the
continuous-execution preference.
Auth/sandbox setup is now pulled out of the per-iteration path so the
loop reuses one sandbox across slices.
The three SANDCASTLE_*_ITERATIONS overrides landed inline in
decompose.mjs and dispatch.mjs (commit 26aa97f) but weren't
surfaced in .env.example. Adds them with the same tuning guidance
the inline comments carry, so users discover the knobs from the
canonical env reference instead of having to grep the code.