Convention shift: epic folders + PRD filenames + frontmatter id
fields are now bare slugs. The created: timestamp (Phase 2) carries
the date; folder names don't repeat it. A future <task-id>-<slug>
shape (e.g. ClickUp) lands cleanly when that integration ships.
Renames (git mv preserves history):
- docs/work/2026-05-13-binder-wrap-helper/
-> docs/work/binder-wrap-helper/
- docs/work/2026-05-14-library-evaluation-policy/
-> docs/work/library-evaluation-policy/
- docs/work/2026-05-14-ci-security-and-supply-chain/
-> docs/work/ci-security-and-supply-chain/
- docs/work/prds/2026-05-13-binder-wrap-helper.prd.md
-> docs/work/prds/binder-wrap-helper.prd.md
- docs/work/prds/2026-05-13-coverage-architecture.prd.md
-> docs/work/prds/coverage-architecture.prd.md
- docs/work/prds/2026-05-14-library-evaluation-policy.prd.md
-> docs/work/prds/library-evaluation-policy.prd.md
- docs/work/prds/2026-05-14-ci-security-and-supply-chain.prd.md
-> docs/work/prds/ci-security-and-supply-chain.prd.md
Frontmatter updates inside the renamed files: epic id, epic prd,
story epic, PRD id, PRD builds-on all drop date prefixes.
System folder + state file move:
- New docs/work/_system/ holds framework-managed state.
- docs/work/_state.json -> docs/work/_system/_state.json.
- state-builder.mjs adds _system to SKIP_FOLDERS.
- cli.mjs + state-sync-guard.mjs + .husky/pre-commit point at the
new path.
template-reset-v1 epic deleted entirely (one-off cleanup epic from
the pre-date-convention era; status was already done).
Generator-template updates (so new artifacts ship in the right
shape):
- .sandcastle/decomposer.prompt.md emits bare-slug folder names +
ISO created: timestamp.
- .claude/skills/to-prd/SKILL.md template uses bare-slug filename +
bare-slug id field + ISO created: timestamp.
Doc reference updates: glossary, runbook, agent-first-workflow-
and-conformance, reviewer prompt, ADR-020, ADR-022, ADR-023 all
point at the new paths/slugs.
18 KiB
id, title, type, status, author, created, updated, adr
| id | title | type | status | author | created | updated | adr |
|---|---|---|---|---|---|---|---|
| library-evaluation-policy | Library evaluation policy — skill, traces, enforcement stack | prd | approved | danijel | 2026-05-14T00:00:00Z | 2026-05-14T19:16:52.691Z | adr-022 |
Problem
This template ships with a deliberately narrow third-party surface — every
feature package today holds the same 6 runtime deps and nothing else. That
discipline is uncodified. New dependencies enter via pnpm add <pkg> with no
checkpoint between intent and lockfile, and three recent signals show the gap:
- The 2026-05-14 grill session nearly added
trpc-to-openapi+zod-to-json-schema- a build-time generator before someone asked "who calls this code path?"
The honest answer was "nobody — all callers are TypeScript via
createCaller."
- a build-time generator before someone asked "who calls this code path?"
The honest answer was "nobody — all callers are TypeScript via
- ADR-002 (Inversify), ADR-014 (Sentry), ADR-017 (OpenTelemetry) each record library decisions, but every record was written after adoption. No mechanism exists to catch a bad choice before it becomes a migration project.
- The repo is EU-resident and GDPR-bound. A library that defaults to a US-only SaaS endpoint silently moves user data out of the EU the moment it's imported with defaults. Nothing currently flags this.
ADR-022 codifies the policy. This PRD implements it.
Goal
A four-layer enforcement stack — Claude hook, skill, pre-commit hook, sandcastle
reviewer prompt — that makes every new runtime dependency in a feature- or
core-tier package produce a permanent library trace at
docs/library-decisions/<YYYY-MM-DD>-<package-name>.md, with rejection traces
treated as first-class records.
In scope
- The
evaluate-libraryskill at.claude/skills/evaluate-library/SKILL.md— authoritative agent runbook; walks 8 hard filters + 3 prompts; writes the trace. - The human reading-room guide at
docs/guides/adding-a-library.mdwith worked examples (approved + rejected). - The
docs/library-decisions/directory +_template.mdschema reference. - A Zod-validated trace-schema module (
scripts/library-decisions/schema.mjs) shared by the skill, the pre-commit checker, and the generator. - Claude
PreToolUsehook (.claude/hooks/library-policy-nudge.sh) — matchespnpm add/pnpm i <pkg>in Bash invocations; emits skill reminder. - Claude
PostToolUsehook (also inlibrary-policy-nudge.sh, dispatching by event type) — matchesEdit/Writeagainst any**/package.json. - Pre-commit hook check script (
scripts/library-decisions/check.mjs) wired into.husky/pre-commit. Blocks the commit when a new runtime dep is staged in a feature/core package and no sibling trace file is staged. - Sandcastle reviewer prompt update (
.sandcastle/reviewer.prompt.md) — the reviewer agent runs the same check before issuing approve/reject. - Optional-cores generator templates (
turbo/generators/templates/core-package/) emit pre-shipped traces per direct dep, dated at generation time, markeddecision: approved, citing the relevant ADR (015/016/018). Five generators updated:events,realtime,audit,trpc,ui. - Backfill traces for every existing runtime dependency in feature- and core- tier packages, dated 2026-05-14, citing existing ADRs (002/014/017) where applicable. Approx 10 traces.
CLAUDE.md"Key Conventions" gets a one-line bullet pointing to ADR-022 + the guide.
Out of scope
- Transitive dependency tracing —
pnpm auditalready handles recursive scanning. - Bundle-size analysis — Vercel / Vite build output already reports this.
- Auto-removal of approved-then-unused deps —
pnpm fallowterritory. - License auto-enforcement at the lockfile layer (license-checker plugins) — defer until the policy has run for some time and we know where it leaks.
- Anything app-tier. Deps in
apps/*are author's call per the tier model. - Devdeps in any tier. Only
dependencies(runtime) require traces.
Constraints
- ADR-022 is the source of truth. This PRD implements but does not extend it.
- ADR-006 + ADR-010 — the tier trigger maps onto the existing boundary-tag system. No new mental model; ESLint already partitions blast radius.
- ADR-019 — the sandcastle reviewer prompt is one of four enforcement
layers. Whatever the agent loop does must compose with the existing prompt
shape at
.sandcastle/reviewer.prompt.md. - ADR-021 — release-please picks up dependency changes from commit history.
The trace file landing in the same commit as the
package.jsonchange is required so release notes correlate cleanly with policy records. - Conformance system parity — the enforcement stack mirrors the 5-gate latency pattern from ADR-012. Same vocabulary, same agent feedback loop.
- Conventional Commits — every commit produced by the implementation
follows
<type>(<scope>): <subject>. --no-verifyis forbidden — the bash-guard hook already enforces this; the new pre-commit check inherits that protection.- Skill must be deterministic from explicit args —
/evaluate-library <name> --tier <feature|core|app> --target <package-path>. The Claude hook produces exactly this invocation from apnpm addcommand line.
Success criteria
pnpm typecheck && pnpm test && pnpm lint && pnpm conformance && pnpm fallow:auditpass green at the end of the epic.pnpm coverage:diffcovers every changed executable line introduced by the implementation slices.- Attempting to commit a new feature-tier runtime dep without a sibling trace file is blocked by the pre-commit hook with a clear error pointing to the skill.
- Running the
evaluate-libraryskill againsttrpc-to-openapi(the rejected library from the grill session) produces adecision: rejectedtrace withnamed-consumer: failand prose citing the conversation, in <90 seconds of agent work. - Running
pnpm turbo gen core-package events(or any other optional core) emits pre-shipped traces for every direct dep of that core intodocs/library-decisions/, alldecision: approvedand ADR-cited. - The Claude
PreToolUsehook fires onpnpm add <anything>and emits the skill-reminder system-reminder; does not auto-deny. - All existing runtime deps in feature- and core-tier packages have backfilled
trace files dated 2026-05-14 in
docs/library-decisions/. CLAUDE.mdKey Conventions includes the one-line policy bullet.docs/glossary.mdalready includes Library trace and Pre-shipped trace entries (landed inline during the grill session).
User stories
- As a developer adding a new feature dependency, I want a deterministic skill that walks me through the 8 filters and 3 prompts in collect-cheap- skip-expensive order, so I don't forget any check and the trace file is written automatically with my answers.
- As an agent dispatched against a slice that needs a new dep, I want the
Claude
PreToolUsehook to inject a system-reminder pointing me at the skill the moment I'm about to runpnpm add, so I don't bypass the policy by reflex. - As an agent editing a
package.jsonby hand, I want the ClaudePostToolUsehook to inject the same reminder, so the policy isn't sidestepped by paste-then-install. - As a reviewer (human or agent), I want the pre-commit hook to refuse a commit that adds a runtime dep to a feature/core package without a sibling trace file, so I don't have to remember to check during review.
- As a future agent considering a previously-rejected library, I want
to find the rejection trace in
docs/library-decisions/in <1s ofls/grep, so I don't re-litigate a decision that has already been made. - As an EU-resident maintainer, I want the EU-data-residency filter to
reject US-only SaaS components by default and force a
self-hostableorEU-region-configuredjustification in the trace, so user data doesn't leave the EU silently. - As a maintainer scaffolding an optional core via
pnpm turbo gen core-package <name>, I want pre-shipped traces emitted for every direct dep of the new core, so the policy is satisfied by construction. - As a security-conscious maintainer, I want the CVE-scan filter to run
pnpm audit --audit-level=moderateat evaluation time and snapshot the result + commands into the trace, so I can re-run them later to detect drift. - As an agent reviewing a slice in sandcastle, I want the reviewer prompt to check for trace presence + correctness, so I can reject incompliant slices without needing a separate workflow.
- As a maintainer reading the repo for the first time, I want
docs/guides/adding-a-library.mdto explain the policy with worked examples (one approved, one rejected), so I understand the why and how before I face the gate myself.
Implementation decisions
Module sketch — what lands where, by package and concern (no file paths where prose suffices):
- The skill itself —
.claude/skills/evaluate-library/follows the same shape asto-prd,grill-with-docs,improve-codebase-architecture. SKILL.md is authoritative; supporting files (POLICY.mdmirroring ADR-022,TRACE-TEMPLATE.mdshowing the YAML+headings shape,EXAMPLES/worked cases) flesh it out. The skill is invocable via slash command/evaluate-library. - Trace schema module — a small shared module at
scripts/library-decisions/schema.mjsexporting (1) a Zod schema for the trace's frontmatter, (2) a parser that reads a trace file and returns the validated frontmatter, (3) a serializer that takes filter results + prose blocks and emits a trace file. Both the skill and the pre-commit checker import this module. Deep module — small interface (parse/serialize/validate), high leverage across the four enforcement layers. - Pre-commit check script —
scripts/library-decisions/check.mjs. Walksgit diff --cached --name-only -- '**/package.json', for each file extracts newly-added dep lines viagit diff --cached <file>, derives the tier from the path, and for each new runtime dep checks thatdocs/library-decisions/*-<name>.mdis also staged withdecision: approved. Exits non-zero with a pointer to the skill if any check fails. Invoked from.husky/pre-commitas step 4 (after the existing state-sync guard). - Claude hooks — a single
.claude/hooks/library-policy-nudge.shthat dispatches ontool_use_typeto handle bothPreToolUse(Bash withpnpm add/pnpm i <pkg>pattern) andPostToolUse(Edit/Write on**/package.json). Same style as the existinggenerator-first-nudge.sh. Emits a non-blocking system-reminder to stdout that the harness threads into the agent's next turn. - Sandcastle reviewer prompt — append a "Library-trace check" section
to
.sandcastle/reviewer.prompt.md. The reviewer runsnode scripts/library-decisions/check.mjs --staged-against <base>in the sandbox before issuing its verdict. - Generator templates — each
turbo/generators/templates/core-package/<name>/gets adocs/library-decisions/subtree with one.mdper direct dep of that core. The generator copies these alongside the core's package.json into the workspace. Frozen via the existingturbo/generators/__snapshots__/core-package/<name>.snapshot.jsonmechanism. - Backfill traces — write one trace per existing runtime dep in feature/
core tier. The deps cluster naturally by ADR provenance: ADR-002 cluster
(Inversify + reflect-metadata), ADR-014 cluster (Sentry family), ADR-017
cluster (OpenTelemetry family), and the un-cited cluster (
payload,@trpc/server,zod,superjson, plus any others surfaced by inventory). All traces dated 2026-05-14,decision: approved, ADR citation where the cluster maps to one. CLAUDE.mdupdate — one bullet in Key Conventions: "New runtime dependencies in feature- or core-tier packages require a trace atdocs/library-decisions/<date>-<name>.mdproduced by theevaluate-libraryskill — see ADR-022."
Trace schema (frontmatter) — Zod schema (lifted from ADR-022 §4):
package: string
version: string // semver range as written in package.json
tier: "app" | "feature" | "core"
decision: "approved" | "rejected"
date: ISO date string
deciders: string[]
adr: string | null // "adr-NNN" slug or null
filter-results: {
license: SPDX-id-string
types: "native" | `@types/${string}` | "none"
maintenance: "active" | "dormant" | "abandoned"
boundary-fit: "pass" | "fail"
shadow-check: "pass" | "fail" | `shadows ${string}`
eu-residency: "ok" | "n/a" | "self-hostable" | "fail"
cve-scan: "clean" | `${cve-id}` | "fail"
named-consumer: "pass" | "fail"
}
verification-commands: string[]
accepted-cves?: string[] // optional per-trace allowlist
Headings (machine-checkable order): one ## Filter: <name> per filter +
one ## Prompt: <name> per prompt, in the order listed in ADR-022.
Skill fail behavior — collect-cheap-skip-expensive. The four cheap structural filters (license, types, shadow-check, boundary-fit) always run to completion. The four expensive filters (maintenance, CVE scan, EU residency, named-consumer) short-circuit after the first reject. The trace records which filters ran and which were skipped, with a "skipped because earlier filter already rejected" sentinel value.
Pre-commit hook decision-state check — beyond presence, the script also
verifies that the trace's decision matches the dep status: a dep listed in
package.json requires decision: approved; a trace with decision: rejected
that names a dep that's also in the package.json is a hard fail (rejected
libraries cannot ship).
Conformance system composition — no new use cases, controllers, manifest entries, audits, events, or jobs. This PRD is a workflow/policy implementation, not a feature-domain change. The conformance gates apply only to the new TypeScript/JS modules (Zod schema module + check script) — they get standard vitest coverage.
Testing decisions
scripts/library-decisions/schema.mjs— unit tests covering: valid trace parses round-trip; missing required field rejected; unknown filter rejected; invalid enum value rejected.scripts/library-decisions/check.mjs— integration tests covering: new feature-tier dep without trace → fail with exit 1; new feature-tier dep with approved trace → pass; new feature-tier dep with rejected trace listed in package.json → fail; new app-tier dep (no trace required) → pass; new devdep → pass (devdeps exempt); multi-file staged diff with mixed pass/fail → fail with per-package report; non-runtime dep (peerDependenciesonly) → pass. Use a temp git repo as the test fixture.- The skill — no automated test in the conformance sense (it's a prose
runbook for an agent). The success criterion is that running it against
trpc-to-openapiproduces the documented rejection trace; verified manually during the epic. - Generator pre-shipped traces — the existing
turbo/generators/__snapshots__/core-package/<name>.snapshot.jsonsnapshot test extends to cover the new trace files. A failing snapshot is the gate. - Claude hook scripts — bash smoke tests that pipe a mocked Claude
hook payload (
{ "tool_input": { "command": "pnpm add foo" } }) into the script and assert stderr contains the skill-reminder marker. Same style asgenerator-first-nudge.sh's existing tests if any (check during impl). - Prior art — mirror the test patterns from
scripts/work/state-sync-guard.mjs(the pre-commit_state.jsoncheck) for the new check script; the fixture/assertion shape carries over directly. - Coverage bands — the new scripts under
scripts/library-decisions/are not feature packages, so they don't have afeature.manifest.tsand aren't bound by per-layer coverage thresholds. Add them to the diff-coverage exception list only if the diff-coverage gate is too strict on script files; default is they should hit 100% statement coverage because they're small.
Open questions
- Q1: CVE-accepted-risk mechanism — per-trace
accepted-cves: ["CVE-XXXX-YYYY"]frontmatter array vs centraldocs/library-decisions/_cve-allowlist.md? — Recommended: per-trace. Acceptance is library-scoped, not global; a central file becomes a god-object that no agent reads in full. - Q2: Should the policy also gate
peerDependenciesadditions, or onlydependencies? — Recommended: onlydependencies. Peer deps are a contract, not a runtime addition; if a feature declares a peer, the actual runtime adopter (an app or another package) is the one whosedependenciesthe policy catches. - Q3: Should the backfill be one commit per trace or one batch commit per
ADR cluster? — Recommended: one commit per cluster (4 commits total),
each with conventional
chore(deps): backfill library traces for <cluster>. Avoids both extremes (1 mega-commit and 10 noisy single-trace commits). - Q4: Should the skill be permitted to write the trace before the user/agent approves the final decision, or must the final write be a separate explicit step? — Recommended: skill writes the trace unconditionally at the end of evaluation; the trace IS the record, including for rejections. No "draft" state.
- Q5: Where do the Claude hooks register themselves? — Investigate:
the existing
.claude/hooks/*.share referenced by some kind of settings file or auto-discovered. Confirm during the first slice that adds the new hook script and matches the existing registration pattern.
Out of scope (deferred)
- Periodic re-verification. Running each trace's
verification-commandson a schedule (nightly?) to detect drift — new CVEs, license changes, upstream abandonment. Deserves its own PRD; would compose withpnpm fallowas a sixth gate. - Auto-generated PR comments that summarize the trace for human reviewers. Nice-to-have once the policy has lived for a quarter.
pnpm libs <subcommand>ergonomic CLI — wrappingcheck.mjsaspnpm libs check, pluspnpm libs list,pnpm libs orphans, etc. Defer until the raw script proves the workflow.
Further notes
- Anchored by ADR-022 — Library evaluation policy. Read that first.
- Glossary entries for Library trace and Pre-shipped trace landed during the 2026-05-14 grill session that produced ADR-022.
- Conversation provenance — the 2026-05-14 grill-with-docs session that produced this PRD is captured in the session transcript; ADR-022 cites the OpenAPI near-miss as concrete catalyst.