feat(work): resume implementer session across same-story slices

Wires sandcastle's native `resumeSession` into the dispatch loop so the implementer walks into task N already knowing what task N-1 discovered — repo layout, helper signatures, gate output, prior diff. No scratchpad / no hand-curated context file; the agent's own Claude Code conversation log is the carrier. Three guardrails keep it bounded: - Story boundary reset. `currentSession` is dropped whenever findNextTask returns a different story id. New domain ≈ new context — keeps story 03 from inheriting story 02's residue. - Token-threshold reset. After each approved slice, sum the implementer's last-iteration usage (inputTokens + cacheCreationInputTokens + cacheReadInputTokens — caching saves dollars but doesn't free window space). If above SANDCASTLE_SESSION_TOKEN_RESET (default 140000 ≈ 70% of Sonnet 4.6's 200k), drop the session before the next task. Configurable via env. - Context-exhausted safety net. If the model rejects with "prompt is too long" / "context_length_exceeded" / similar, the retry loop drops the session and re-runs the attempt fresh exactly once. Doesn't count against SANDCASTLE_MAX_ATTEMPTS (different failure mode). Reviewer always runs fresh — each approve/reject decision should be independent of prior tasks to keep the gate honest. Within a single slice's reject-fixup retries, the implementer also carries forward across attempts (so attempt 2 sees attempt 1's reasoning + the reviewer notes), but that's per-slice cumulative, not cross-slice. runOneSlice now returns { sessionId, usage } so executeDispatch can make the carry-or-reset decision per slice.
2026-05-13 20:13:30 +02:00
parent 81a791c5fd
commit d5c01209ea
2 changed files with 172 additions and 14 deletions
--- a/.env.example
+++ b/.env.example
@@ -88,3 +88,14 @@ CMS_URL=http://localhost:3001
 # notes printed. Bump for tricky slices; lower for fast-feedback iteration.
 #
 # SANDCASTLE_MAX_ATTEMPTS=3
+
+# Session-resume token threshold. The orchestrator passes the prior
+# implementer's session ID into the next slice's run() via sandcastle's
+# `resumeSession` — the agent walks into task 2 already knowing where
+# helpers live, what the prior diff looked like, which gates passed.
+# When the prior iteration's total input tokens (input + cacheRead +
+# cacheCreation) crosses this threshold the orchestrator drops the
+# session and starts the next task fresh, avoiding mid-slice context
+# exhaustion. Default 140000 ≈ 70% of Sonnet 4.6's 200k window.
+#
+# SANDCASTLE_SESSION_TOKEN_RESET=140000