--- id: library-evaluation-policy title: Library evaluation policy — skill, traces, enforcement stack type: prd status: approved author: danijel created: 2026-05-14T00:00:00Z updated: 2026-05-14T19:16:52.691Z adr: adr-022 --- ## Problem This template ships with a deliberately narrow third-party surface — every feature package today holds the same 6 runtime deps and nothing else. That discipline is uncodified. New dependencies enter via `pnpm add ` with no checkpoint between intent and lockfile, and three recent signals show the gap: 1. The 2026-05-14 grill session nearly added `trpc-to-openapi` + `zod-to-json-schema` - a build-time generator before someone asked "who calls this code path?" The honest answer was "nobody — all callers are TypeScript via `createCaller`." 2. ADR-002 (Inversify), ADR-014 (Sentry), ADR-017 (OpenTelemetry) each record library decisions, but every record was written _after_ adoption. No mechanism exists to catch a bad choice before it becomes a migration project. 3. The repo is EU-resident and GDPR-bound. A library that defaults to a US-only SaaS endpoint silently moves user data out of the EU the moment it's imported with defaults. Nothing currently flags this. ADR-022 codifies the policy. This PRD implements it. ## Goal A four-layer enforcement stack — Claude hook, skill, pre-commit hook, sandcastle reviewer prompt — that makes every new runtime dependency in a feature- or core-tier package produce a permanent **library trace** at `docs/library-decisions/-.md`, with rejection traces treated as first-class records. ## In scope - The `evaluate-library` skill at `.claude/skills/evaluate-library/SKILL.md` — authoritative agent runbook; walks 8 hard filters + 3 prompts; writes the trace. - The human reading-room guide at `docs/guides/adding-a-library.md` with worked examples (approved + rejected). - The `docs/library-decisions/` directory + `_template.md` schema reference. - A Zod-validated trace-schema module (`scripts/library-decisions/schema.mjs`) shared by the skill, the pre-commit checker, and the generator. - Claude `PreToolUse` hook (`.claude/hooks/library-policy-nudge.sh`) — matches `pnpm add` / `pnpm i ` in Bash invocations; emits skill reminder. - Claude `PostToolUse` hook (also in `library-policy-nudge.sh`, dispatching by event type) — matches `Edit`/`Write` against any `**/package.json`. - Pre-commit hook check script (`scripts/library-decisions/check.mjs`) wired into `.husky/pre-commit`. Blocks the commit when a new runtime dep is staged in a feature/core package and no sibling trace file is staged. - Sandcastle reviewer prompt update (`.sandcastle/reviewer.prompt.md`) — the reviewer agent runs the same check before issuing approve/reject. - Optional-cores generator templates (`turbo/generators/templates/core-package/`) emit pre-shipped traces per direct dep, dated at generation time, marked `decision: approved`, citing the relevant ADR (015/016/018). Five generators updated: `events`, `realtime`, `audit`, `trpc`, `ui`. - Backfill traces for every existing runtime dependency in feature- and core- tier packages, dated 2026-05-14, citing existing ADRs (002/014/017) where applicable. Approx 10 traces. - `CLAUDE.md` "Key Conventions" gets a one-line bullet pointing to ADR-022 + the guide. ## Out of scope - Transitive dependency tracing — `pnpm audit` already handles recursive scanning. - Bundle-size analysis — Vercel / Vite build output already reports this. - Auto-removal of approved-then-unused deps — `pnpm fallow` territory. - License auto-enforcement at the lockfile layer (license-checker plugins) — defer until the policy has run for some time and we know where it leaks. - Anything app-tier. Deps in `apps/*` are author's call per the tier model. - Devdeps in any tier. Only `dependencies` (runtime) require traces. ## Constraints - **ADR-022** is the source of truth. This PRD implements but does not extend it. - **ADR-006 + ADR-010** — the tier trigger maps onto the existing boundary-tag system. No new mental model; ESLint already partitions blast radius. - **ADR-019** — the sandcastle reviewer prompt is one of four enforcement layers. Whatever the agent loop does must compose with the existing prompt shape at `.sandcastle/reviewer.prompt.md`. - **ADR-021** — release-please picks up dependency changes from commit history. The trace file landing in the **same commit** as the `package.json` change is required so release notes correlate cleanly with policy records. - **Conformance system parity** — the enforcement stack mirrors the 5-gate latency pattern from ADR-012. Same vocabulary, same agent feedback loop. - **Conventional Commits** — every commit produced by the implementation follows `(): `. - **`--no-verify` is forbidden** — the bash-guard hook already enforces this; the new pre-commit check inherits that protection. - **Skill must be deterministic from explicit args** — `/evaluate-library --tier --target `. The Claude hook produces exactly this invocation from a `pnpm add` command line. ## Success criteria - `pnpm typecheck && pnpm test && pnpm lint && pnpm conformance && pnpm fallow:audit` pass green at the end of the epic. - `pnpm coverage:diff` covers every changed executable line introduced by the implementation slices. - Attempting to commit a new feature-tier runtime dep without a sibling trace file is blocked by the pre-commit hook with a clear error pointing to the skill. - Running the `evaluate-library` skill against `trpc-to-openapi` (the rejected library from the grill session) produces a `decision: rejected` trace with `named-consumer: fail` and prose citing the conversation, in <90 seconds of agent work. - Running `pnpm turbo gen core-package events` (or any other optional core) emits pre-shipped traces for every direct dep of that core into `docs/library-decisions/`, all `decision: approved` and ADR-cited. - The Claude `PreToolUse` hook fires on `pnpm add ` and emits the skill-reminder system-reminder; does **not** auto-deny. - All existing runtime deps in feature- and core-tier packages have backfilled trace files dated 2026-05-14 in `docs/library-decisions/`. - `CLAUDE.md` Key Conventions includes the one-line policy bullet. - `docs/glossary.md` already includes **Library trace** and **Pre-shipped trace** entries (landed inline during the grill session). ## User stories 1. **As a developer adding a new feature dependency**, I want a deterministic skill that walks me through the 8 filters and 3 prompts in collect-cheap- skip-expensive order, so I don't forget any check and the trace file is written automatically with my answers. 2. **As an agent dispatched against a slice that needs a new dep**, I want the Claude `PreToolUse` hook to inject a system-reminder pointing me at the skill the moment I'm about to run `pnpm add`, so I don't bypass the policy by reflex. 3. **As an agent editing a `package.json` by hand**, I want the Claude `PostToolUse` hook to inject the same reminder, so the policy isn't sidestepped by paste-then-install. 4. **As a reviewer (human or agent)**, I want the pre-commit hook to refuse a commit that adds a runtime dep to a feature/core package without a sibling trace file, so I don't have to remember to check during review. 5. **As a future agent considering a previously-rejected library**, I want to find the rejection trace in `docs/library-decisions/` in <1s of `ls`/`grep`, so I don't re-litigate a decision that has already been made. 6. **As an EU-resident maintainer**, I want the EU-data-residency filter to reject US-only SaaS components by default and force a `self-hostable` or `EU-region-configured` justification in the trace, so user data doesn't leave the EU silently. 7. **As a maintainer scaffolding an optional core via `pnpm turbo gen core-package `**, I want pre-shipped traces emitted for every direct dep of the new core, so the policy is satisfied by construction. 8. **As a security-conscious maintainer**, I want the CVE-scan filter to run `pnpm audit --audit-level=moderate` at evaluation time and snapshot the result + commands into the trace, so I can re-run them later to detect drift. 9. **As an agent reviewing a slice in sandcastle**, I want the reviewer prompt to check for trace presence + correctness, so I can reject incompliant slices without needing a separate workflow. 10. **As a maintainer reading the repo for the first time**, I want `docs/guides/adding-a-library.md` to explain the policy with worked examples (one approved, one rejected), so I understand the why and how before I face the gate myself. ## Implementation decisions **Module sketch** — what lands where, by package and concern (no file paths where prose suffices): - **The skill itself** — `.claude/skills/evaluate-library/` follows the same shape as `to-prd`, `grill-with-docs`, `improve-codebase-architecture`. SKILL.md is authoritative; supporting files (`POLICY.md` mirroring ADR-022, `TRACE-TEMPLATE.md` showing the YAML+headings shape, `EXAMPLES/` worked cases) flesh it out. The skill is invocable via slash command `/evaluate-library`. - **Trace schema module** — a small shared module at `scripts/library-decisions/schema.mjs` exporting (1) a Zod schema for the trace's frontmatter, (2) a parser that reads a trace file and returns the validated frontmatter, (3) a serializer that takes filter results + prose blocks and emits a trace file. Both the skill and the pre-commit checker import this module. Deep module — small interface (parse/serialize/validate), high leverage across the four enforcement layers. - **Pre-commit check script** — `scripts/library-decisions/check.mjs`. Walks `git diff --cached --name-only -- '**/package.json'`, for each file extracts newly-added dep lines via `git diff --cached `, derives the tier from the path, and for each new runtime dep checks that `docs/library-decisions/*-.md` is also staged with `decision: approved`. Exits non-zero with a pointer to the skill if any check fails. Invoked from `.husky/pre-commit` as step 4 (after the existing state-sync guard). - **Claude hooks** — a single `.claude/hooks/library-policy-nudge.sh` that dispatches on `tool_use_type` to handle both `PreToolUse` (Bash with `pnpm add` / `pnpm i ` pattern) and `PostToolUse` (Edit/Write on `**/package.json`). Same style as the existing `generator-first-nudge.sh`. Emits a non-blocking system-reminder to stdout that the harness threads into the agent's next turn. - **Sandcastle reviewer prompt** — append a "Library-trace check" section to `.sandcastle/reviewer.prompt.md`. The reviewer runs `node scripts/library-decisions/check.mjs --staged-against ` in the sandbox before issuing its verdict. - **Generator templates** — each `turbo/generators/templates/core-package//` gets a `docs/library-decisions/` subtree with one `.md` per direct dep of that core. The generator copies these alongside the core's package.json into the workspace. Frozen via the existing `turbo/generators/__snapshots__/core-package/.snapshot.json` mechanism. - **Backfill traces** — write one trace per existing runtime dep in feature/ core tier. The deps cluster naturally by ADR provenance: ADR-002 cluster (Inversify + reflect-metadata), ADR-014 cluster (Sentry family), ADR-017 cluster (OpenTelemetry family), and the un-cited cluster (`payload`, `@trpc/server`, `zod`, `superjson`, plus any others surfaced by inventory). All traces dated 2026-05-14, `decision: approved`, ADR citation where the cluster maps to one. - **`CLAUDE.md` update** — one bullet in Key Conventions: _"New runtime dependencies in feature- or core-tier packages require a trace at `docs/library-decisions/-.md` produced by the `evaluate-library` skill — see ADR-022."_ **Trace schema (frontmatter)** — Zod schema (lifted from ADR-022 §4): ``` package: string version: string // semver range as written in package.json tier: "app" | "feature" | "core" decision: "approved" | "rejected" date: ISO date string deciders: string[] adr: string | null // "adr-NNN" slug or null filter-results: { license: SPDX-id-string types: "native" | `@types/${string}` | "none" maintenance: "active" | "dormant" | "abandoned" boundary-fit: "pass" | "fail" shadow-check: "pass" | "fail" | `shadows ${string}` eu-residency: "ok" | "n/a" | "self-hostable" | "fail" cve-scan: "clean" | `${cve-id}` | "fail" named-consumer: "pass" | "fail" } verification-commands: string[] accepted-cves?: string[] // optional per-trace allowlist ``` Headings (machine-checkable order): one `## Filter: ` per filter + one `## Prompt: ` per prompt, in the order listed in ADR-022. **Skill fail behavior** — collect-cheap-skip-expensive. The four cheap structural filters (license, types, shadow-check, boundary-fit) always run to completion. The four expensive filters (maintenance, CVE scan, EU residency, named-consumer) short-circuit after the first reject. The trace records which filters ran and which were skipped, with a "skipped because earlier filter already rejected" sentinel value. **Pre-commit hook decision-state check** — beyond presence, the script also verifies that the trace's `decision` matches the dep status: a dep listed in `package.json` requires `decision: approved`; a trace with `decision: rejected` that names a dep that's also in the package.json is a hard fail (rejected libraries cannot ship). **Conformance system composition** — no new use cases, controllers, manifest entries, audits, events, or jobs. This PRD is a workflow/policy implementation, not a feature-domain change. The conformance gates apply only to the new TypeScript/JS modules (Zod schema module + check script) — they get standard vitest coverage. ## Testing decisions - **`scripts/library-decisions/schema.mjs`** — unit tests covering: valid trace parses round-trip; missing required field rejected; unknown filter rejected; invalid enum value rejected. - **`scripts/library-decisions/check.mjs`** — integration tests covering: new feature-tier dep without trace → fail with exit 1; new feature-tier dep with approved trace → pass; new feature-tier dep with rejected trace listed in package.json → fail; new app-tier dep (no trace required) → pass; new devdep → pass (devdeps exempt); multi-file staged diff with mixed pass/fail → fail with per-package report; non-runtime dep (`peerDependencies` only) → pass. Use a temp git repo as the test fixture. - **The skill** — no automated test in the conformance sense (it's a prose runbook for an agent). The success criterion is that running it against `trpc-to-openapi` produces the documented rejection trace; verified manually during the epic. - **Generator pre-shipped traces** — the existing `turbo/generators/__snapshots__/core-package/.snapshot.json` snapshot test extends to cover the new trace files. A failing snapshot is the gate. - **Claude hook scripts** — bash smoke tests that pipe a mocked Claude hook payload (`{ "tool_input": { "command": "pnpm add foo" } }`) into the script and assert stderr contains the skill-reminder marker. Same style as `generator-first-nudge.sh`'s existing tests if any (check during impl). - **Prior art** — mirror the test patterns from `scripts/work/state-sync-guard.mjs` (the pre-commit `_state.json` check) for the new check script; the fixture/assertion shape carries over directly. - **Coverage bands** — the new scripts under `scripts/library-decisions/` are not feature packages, so they don't have a `feature.manifest.ts` and aren't bound by per-layer coverage thresholds. Add them to the diff-coverage exception list **only if** the diff-coverage gate is too strict on script files; default is they should hit 100% statement coverage because they're small. ## Open questions - **Q1:** CVE-accepted-risk mechanism — per-trace `accepted-cves: ["CVE-XXXX-YYYY"]` frontmatter array vs central `docs/library-decisions/_cve-allowlist.md`? — **Recommended:** per-trace. Acceptance is library-scoped, not global; a central file becomes a god-object that no agent reads in full. - **Q2:** Should the policy also gate `peerDependencies` additions, or only `dependencies`? — **Recommended:** only `dependencies`. Peer deps are a contract, not a runtime addition; if a feature declares a peer, the actual runtime adopter (an app or another package) is the one whose `dependencies` the policy catches. - **Q3:** Should the backfill be one commit per trace or one batch commit per ADR cluster? — **Recommended:** one commit per cluster (4 commits total), each with conventional `chore(deps): backfill library traces for `. Avoids both extremes (1 mega-commit and 10 noisy single-trace commits). - **Q4:** Should the skill be permitted to **write the trace** before the user/agent approves the final decision, or must the final write be a separate explicit step? — **Recommended:** skill writes the trace unconditionally at the end of evaluation; the trace IS the record, including for rejections. No "draft" state. - **Q5:** Where do the Claude hooks register themselves? — **Investigate:** the existing `.claude/hooks/*.sh` are referenced by some kind of settings file or auto-discovered. Confirm during the first slice that adds the new hook script and matches the existing registration pattern. ## Out of scope (deferred) - **Periodic re-verification.** Running each trace's `verification-commands` on a schedule (nightly?) to detect drift — new CVEs, license changes, upstream abandonment. Deserves its own PRD; would compose with `pnpm fallow` as a sixth gate. - **Auto-generated PR comments** that summarize the trace for human reviewers. Nice-to-have once the policy has lived for a quarter. - **`pnpm libs ` ergonomic CLI** — wrapping `check.mjs` as `pnpm libs check`, plus `pnpm libs list`, `pnpm libs orphans`, etc. Defer until the raw script proves the workflow. ## Further notes - **Anchored by ADR-022** — Library evaluation policy. Read that first. - **Glossary entries** for **Library trace** and **Pre-shipped trace** landed during the 2026-05-14 grill session that produced ADR-022. - **Conversation provenance** — the 2026-05-14 grill-with-docs session that produced this PRD is captured in the session transcript; ADR-022 cites the OpenAPI near-miss as concrete catalyst.