fix(sandcastle): make Dockerfile match sandcastle's expected shape + document macOS keychain quirk

Two separate sandbox blockers surfaced when the user tried
`pnpm work decompose --execute`:

1. **Container died on exec** — our Dockerfile had:
     - WORKDIR /workspace + CMD ["bash"]
     - No `agent` user (sandcastle exec's as UID:GID it built with)
     - node:22-bookworm-slim (missing some build deps the install
       script wants)
   Sandcastle expects:
     - A non-root `agent` user with home at /home/agent (sandcastle
       does `git config --global --add safe.directory /home/agent/workspace`,
       which fails if the user doesn't exist or the container exited)
     - ENTRYPOINT ["sleep", "infinity"] so the container survives
       the gap between sandcastle creating it and exec'ing in
   Replaced .sandcastle/Dockerfile with the shape `sandcastle init`
   would generate (verified against
   node_modules/@ai-hero/sandcastle/dist/InitService.js):
     - node:22-bookworm (full, not slim) for build tooling
     - apt-get installs git + curl + jq
     - corepack-pinned pnpm@9
     - ARG AGENT_UID=1000 + AGENT_GID=1000; sandcastle's
       build-image passes the host's UID/GID by default
     - `groupmod -o -g $AGENT_GID node` + `usermod -o ... node` —
       the `-o` (non-unique) flag is required because macOS hosts
       have UID:501 GID:20, and GID 20 collides with Debian's
       `dialout` group in the base image (without -o, groupmod
       fails with "GID '20' already exists")
     - USER ${AGENT_UID}:${AGENT_GID}, then install Claude Code CLI
       via the official installer
     - ENV PATH includes /home/agent/.local/bin
     - WORKDIR /home/agent (sandcastle overrides per-run anyway)
     - ENTRYPOINT ["sleep", "infinity"] keeps the container alive

2. **"Not logged in · Please run /login"** inside the container —
   Claude Code on macOS stores credentials in the Keychain, NOT in
   ~/.claude/.credentials.json. Sandcastle's bind-mount of ~/.claude
   finds nothing usable. Documented the workaround:
     - README.md "Sandcastle setup (one-time)" — macOS-specific
       block with the `security find-generic-password ... > ~/.claude/.credentials.json`
       one-liner + chmod 600 + the security trade-off (plaintext
       file vs keychain isolation)
     - docs/guides/runbook.md "Using Sandcastle → Prerequisites" —
       step 3 (Authentication) gets a "macOS quirk" subsection with
       the same extraction one-liner + the API-key fallback as the
       alternative path
     - scripts/work/{dispatch,decompose}.mjs — when the sandcastle
       error matches /Not logged in|Please run \/login/ AND we're on
       darwin, the dispatcher prints the keychain-extraction
       commands + the API-key fallback inline above the generic
       "See runbook" line, so future agents discover the fix at the
       failure site

The image rebuilds clean (`pnpm exec sandcastle docker
build-image`) at ~1.95GB and the container survives sandcastle's
exec — confirmed by reaching the "Not logged in" stage (which is
the next-layer issue, not the Dockerfile issue).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-13 18:02:34 +02:00
parent cd0a332443
commit 7737358509
3 changed files with 71 additions and 16 deletions

View File

@@ -1,26 +1,52 @@
# Sandcastle sandbox image — runs the implementer + reviewer agents.
# Sandcastle sandbox image — runs the implementer + reviewer + decomposer
# agents. Shape required by @ai-hero/sandcastle: a non-root `agent` user
# (UID/GID aligned with the host so bind-mounted files share owner), Claude
# Code CLI on PATH, and a long-running ENTRYPOINT so the container survives
# the gap between sandcastle creating it and exec'ing into it.
#
# Includes Claude Code CLI so the sandbox can authenticate via the host's
# mounted ~/.claude/ session (sandcastle issue #191 workaround — subscription
# auth, not API-key auth, is our primary flow). Falls back to ANTHROPIC_API_KEY
# when no host credentials are available.
# Authenticates via the host's mounted ~/.claude/ session (subscription
# mode — sandcastle issue #191 workaround, our primary flow). Falls back
# to ANTHROPIC_API_KEY when no host credentials are present.
FROM node:22-bookworm-slim
FROM node:22-bookworm
# pnpm via corepack (matches the repo's pnpm version)
# System deps — git for worktree ops, curl for the Claude installer, jq for
# JSON tooling agents use, plus ca-certificates implicit in the base image.
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
curl \
jq \
&& rm -rf /var/lib/apt/lists/*
# pnpm via corepack (matches the repo's packageManager version).
RUN corepack enable && corepack prepare pnpm@9 --activate
# Build-args for UID/GID alignment: `sandcastle docker build-image` passes
# the host user's UID/GID by default so image-built files and bind-mounted
# files share an owner without runtime chown.
ARG AGENT_UID=1000
ARG AGENT_GID=1000
# Rename the base image's "node" user to "agent" and align UID/GID.
# `-o` (non-unique) is required because the host's GID may collide with a
# pre-existing system group in the base image (e.g. macOS UID:501 GID:20
# collides with Debian's `dialout` group at GID 20). Allowing a duplicate
# GID is safe here — only one user occupies the sandbox.
RUN groupmod -o -g $AGENT_GID node && \
usermod -o -u $AGENT_UID -g $AGENT_GID -d /home/agent -m -l agent node
USER ${AGENT_UID}:${AGENT_GID}
# Claude Code CLI — used by sandcastle's claudeCode() agent provider.
# The CLI reads credentials from ~/.claude/ inside the container; the host
# mounts its ~/.claude/ over that path at sandbox start.
RUN npm install -g @anthropic-ai/claude-code
RUN curl -fsSL https://claude.ai/install.sh | bash
# Minimal system deps for git operations + healthchecks.
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
ENV PATH="/home/agent/.local/bin:$PATH"
WORKDIR /workspace
WORKDIR /home/agent
CMD ["bash"]
# In worktree sandbox mode, sandcastle bind-mounts the git worktree at
# ${SANDBOX_REPO_DIR} and overrides the working directory to that path at
# container start. The Dockerfile's WORKDIR is just the default home.
ENTRYPOINT ["sleep", "infinity"]