agentic-dev/docs/work/prds/library-evaluation-policy.prd.md

---
id: library-evaluation-policy
title: Library evaluation policy — skill, traces, enforcement stack
type: prd
status: approved
author: danijel
created: 2026-05-14T00:00:00Z
updated: 2026-05-14T19:16:52.691Z
adr: adr-022
---

## Problem

This template ships with a deliberately narrow third-party surface — every
feature package today holds the same 6 runtime deps and nothing else. That
discipline is uncodified. New dependencies enter via `pnpm add <pkg>` with no
checkpoint between intent and lockfile, and three recent signals show the gap:

1. The 2026-05-14 grill session nearly added `trpc-to-openapi` + `zod-to-json-schema`
   - a build-time generator before someone asked "who calls this code path?"
     The honest answer was "nobody — all callers are TypeScript via `createCaller`."
2. ADR-002 (Inversify), ADR-014 (Sentry), ADR-017 (OpenTelemetry) each record
   library decisions, but every record was written _after_ adoption. No mechanism
   exists to catch a bad choice before it becomes a migration project.
3. The repo is EU-resident and GDPR-bound. A library that defaults to a US-only
   SaaS endpoint silently moves user data out of the EU the moment it's
   imported with defaults. Nothing currently flags this.

ADR-022 codifies the policy. This PRD implements it.

## Goal

A four-layer enforcement stack — Claude hook, skill, pre-commit hook, sandcastle
reviewer prompt — that makes every new runtime dependency in a feature- or
core-tier package produce a permanent **library trace** at
`docs/library-decisions/<YYYY-MM-DD>-<package-name>.md`, with rejection traces
treated as first-class records.

## In scope

- The `evaluate-library` skill at `.claude/skills/evaluate-library/SKILL.md` —
  authoritative agent runbook; walks 8 hard filters + 3 prompts; writes the trace.
- The human reading-room guide at `docs/guides/adding-a-library.md` with worked
  examples (approved + rejected).
- The `docs/library-decisions/` directory + `_template.md` schema reference.
- A Zod-validated trace-schema module (`scripts/library-decisions/schema.mjs`)
  shared by the skill, the pre-commit checker, and the generator.
- Claude `PreToolUse` hook (`.claude/hooks/library-policy-nudge.sh`) — matches
  `pnpm add` / `pnpm i <pkg>` in Bash invocations; emits skill reminder.
- Claude `PostToolUse` hook (also in `library-policy-nudge.sh`, dispatching by
  event type) — matches `Edit`/`Write` against any `**/package.json`.
- Pre-commit hook check script (`scripts/library-decisions/check.mjs`) wired
  into `.husky/pre-commit`. Blocks the commit when a new runtime dep is staged
  in a feature/core package and no sibling trace file is staged.
- Sandcastle reviewer prompt update (`.sandcastle/reviewer.prompt.md`) — the
  reviewer agent runs the same check before issuing approve/reject.
- Optional-cores generator templates (`turbo/generators/templates/core-package/`)
  emit pre-shipped traces per direct dep, dated at generation time, marked
  `decision: approved`, citing the relevant ADR (015/016/018). Five generators
  updated: `events`, `realtime`, `audit`, `trpc`, `ui`.
- Backfill traces for every existing runtime dependency in feature- and core-
  tier packages, dated 2026-05-14, citing existing ADRs (002/014/017) where
  applicable. Approx 10 traces.
- `CLAUDE.md` "Key Conventions" gets a one-line bullet pointing to ADR-022 +
  the guide.

## Out of scope

- Transitive dependency tracing — `pnpm audit` already handles recursive scanning.
- Bundle-size analysis — Vercel / Vite build output already reports this.
- Auto-removal of approved-then-unused deps — `pnpm fallow` territory.
- License auto-enforcement at the lockfile layer (license-checker plugins) —
  defer until the policy has run for some time and we know where it leaks.
- Anything app-tier. Deps in `apps/*` are author's call per the tier model.
- Devdeps in any tier. Only `dependencies` (runtime) require traces.

## Constraints

- **ADR-022** is the source of truth. This PRD implements but does not extend it.
- **ADR-006 + ADR-010** — the tier trigger maps onto the existing boundary-tag
  system. No new mental model; ESLint already partitions blast radius.
- **ADR-019** — the sandcastle reviewer prompt is one of four enforcement
  layers. Whatever the agent loop does must compose with the existing prompt
  shape at `.sandcastle/reviewer.prompt.md`.
- **ADR-021** — release-please picks up dependency changes from commit history.
  The trace file landing in the **same commit** as the `package.json` change
  is required so release notes correlate cleanly with policy records.
- **Conformance system parity** — the enforcement stack mirrors the 5-gate
  latency pattern from ADR-012. Same vocabulary, same agent feedback loop.
- **Conventional Commits** — every commit produced by the implementation
  follows `<type>(<scope>): <subject>`.
- **`--no-verify` is forbidden** — the bash-guard hook already enforces this;
  the new pre-commit check inherits that protection.
- **Skill must be deterministic from explicit args** — `/evaluate-library <name>
--tier <feature|core|app> --target <package-path>`. The Claude hook produces
  exactly this invocation from a `pnpm add` command line.

## Success criteria

- `pnpm typecheck && pnpm test && pnpm lint && pnpm conformance && pnpm fallow:audit`
  pass green at the end of the epic.
- `pnpm coverage:diff` covers every changed executable line introduced by
  the implementation slices.
- Attempting to commit a new feature-tier runtime dep without a sibling trace
  file is blocked by the pre-commit hook with a clear error pointing to the skill.
- Running the `evaluate-library` skill against `trpc-to-openapi` (the rejected
  library from the grill session) produces a `decision: rejected` trace with
  `named-consumer: fail` and prose citing the conversation, in <90 seconds of
  agent work.
- Running `pnpm turbo gen core-package events` (or any other optional core)
  emits pre-shipped traces for every direct dep of that core into
  `docs/library-decisions/`, all `decision: approved` and ADR-cited.
- The Claude `PreToolUse` hook fires on `pnpm add <anything>` and emits the
  skill-reminder system-reminder; does **not** auto-deny.
- All existing runtime deps in feature- and core-tier packages have backfilled
  trace files dated 2026-05-14 in `docs/library-decisions/`.
- `CLAUDE.md` Key Conventions includes the one-line policy bullet.
- `docs/glossary.md` already includes **Library trace** and **Pre-shipped
  trace** entries (landed inline during the grill session).

## User stories

1. **As a developer adding a new feature dependency**, I want a deterministic
   skill that walks me through the 8 filters and 3 prompts in collect-cheap-
   skip-expensive order, so I don't forget any check and the trace file is
   written automatically with my answers.
2. **As an agent dispatched against a slice that needs a new dep**, I want the
   Claude `PreToolUse` hook to inject a system-reminder pointing me at the
   skill the moment I'm about to run `pnpm add`, so I don't bypass the policy
   by reflex.
3. **As an agent editing a `package.json` by hand**, I want the Claude
   `PostToolUse` hook to inject the same reminder, so the policy isn't
   sidestepped by paste-then-install.
4. **As a reviewer (human or agent)**, I want the pre-commit hook to refuse a
   commit that adds a runtime dep to a feature/core package without a sibling
   trace file, so I don't have to remember to check during review.
5. **As a future agent considering a previously-rejected library**, I want
   to find the rejection trace in `docs/library-decisions/` in <1s of
   `ls`/`grep`, so I don't re-litigate a decision that has already been made.
6. **As an EU-resident maintainer**, I want the EU-data-residency filter to
   reject US-only SaaS components by default and force a `self-hostable` or
   `EU-region-configured` justification in the trace, so user data doesn't
   leave the EU silently.
7. **As a maintainer scaffolding an optional core via
   `pnpm turbo gen core-package <name>`**, I want pre-shipped traces emitted
   for every direct dep of the new core, so the policy is satisfied by
   construction.
8. **As a security-conscious maintainer**, I want the CVE-scan filter to run
   `pnpm audit --audit-level=moderate` at evaluation time and snapshot the
   result + commands into the trace, so I can re-run them later to detect drift.
9. **As an agent reviewing a slice in sandcastle**, I want the reviewer prompt
   to check for trace presence + correctness, so I can reject incompliant
   slices without needing a separate workflow.
10. **As a maintainer reading the repo for the first time**, I want
    `docs/guides/adding-a-library.md` to explain the policy with worked
    examples (one approved, one rejected), so I understand the why and how
    before I face the gate myself.

## Implementation decisions

**Module sketch** — what lands where, by package and concern (no file paths
where prose suffices):

- **The skill itself** — `.claude/skills/evaluate-library/` follows the same
  shape as `to-prd`, `grill-with-docs`, `improve-codebase-architecture`.
  SKILL.md is authoritative; supporting files (`POLICY.md` mirroring ADR-022,
  `TRACE-TEMPLATE.md` showing the YAML+headings shape, `EXAMPLES/` worked
  cases) flesh it out. The skill is invocable via slash command
  `/evaluate-library`.
- **Trace schema module** — a small shared module at
  `scripts/library-decisions/schema.mjs` exporting (1) a Zod schema for the
  trace's frontmatter, (2) a parser that reads a trace file and returns the
  validated frontmatter, (3) a serializer that takes filter results + prose
  blocks and emits a trace file. Both the skill and the pre-commit checker
  import this module. Deep module — small interface (parse/serialize/validate),
  high leverage across the four enforcement layers.
- **Pre-commit check script** — `scripts/library-decisions/check.mjs`. Walks
  `git diff --cached --name-only -- '**/package.json'`, for each file extracts
  newly-added dep lines via `git diff --cached <file>`, derives the tier from
  the path, and for each new runtime dep checks that
  `docs/library-decisions/*-<name>.md` is also staged with `decision: approved`.
  Exits non-zero with a pointer to the skill if any check fails. Invoked from
  `.husky/pre-commit` as step 4 (after the existing state-sync guard).
- **Claude hooks** — a single `.claude/hooks/library-policy-nudge.sh` that
  dispatches on `tool_use_type` to handle both `PreToolUse` (Bash with
  `pnpm add` / `pnpm i <pkg>` pattern) and `PostToolUse` (Edit/Write on
  `**/package.json`). Same style as the existing `generator-first-nudge.sh`.
  Emits a non-blocking system-reminder to stdout that the harness threads
  into the agent's next turn.
- **Sandcastle reviewer prompt** — append a "Library-trace check" section
  to `.sandcastle/reviewer.prompt.md`. The reviewer runs
  `node scripts/library-decisions/check.mjs --staged-against <base>` in the
  sandbox before issuing its verdict.
- **Generator templates** — each `turbo/generators/templates/core-package/<name>/`
  gets a `docs/library-decisions/` subtree with one `.md` per direct dep of
  that core. The generator copies these alongside the core's package.json
  into the workspace. Frozen via the existing
  `turbo/generators/__snapshots__/core-package/<name>.snapshot.json` mechanism.
- **Backfill traces** — write one trace per existing runtime dep in feature/
  core tier. The deps cluster naturally by ADR provenance: ADR-002 cluster
  (Inversify + reflect-metadata), ADR-014 cluster (Sentry family), ADR-017
  cluster (OpenTelemetry family), and the un-cited cluster (`payload`,
  `@trpc/server`, `zod`, `superjson`, plus any others surfaced by inventory).
  All traces dated 2026-05-14, `decision: approved`, ADR citation where the
  cluster maps to one.
- **`CLAUDE.md` update** — one bullet in Key Conventions: _"New runtime
  dependencies in feature- or core-tier packages require a trace at
  `docs/library-decisions/<date>-<name>.md` produced by the `evaluate-library`
  skill — see ADR-022."_

**Trace schema (frontmatter)** — Zod schema (lifted from ADR-022 §4):

```
package: string
version: string                  // semver range as written in package.json
tier: "app" | "feature" | "core"
decision: "approved" | "rejected"
date: ISO date string
deciders: string[]
adr: string | null               // "adr-NNN" slug or null
filter-results: {
  license: SPDX-id-string
  types: "native" | `@types/${string}` | "none"
  maintenance: "active" | "dormant" | "abandoned"
  boundary-fit: "pass" | "fail"
  shadow-check: "pass" | "fail" | `shadows ${string}`
  eu-residency: "ok" | "n/a" | "self-hostable" | "fail"
  cve-scan: "clean" | `${cve-id}` | "fail"
  named-consumer: "pass" | "fail"
}
verification-commands: string[]
accepted-cves?: string[]         // optional per-trace allowlist
```

Headings (machine-checkable order): one `## Filter: <name>` per filter +
one `## Prompt: <name>` per prompt, in the order listed in ADR-022.

**Skill fail behavior** — collect-cheap-skip-expensive. The four cheap
structural filters (license, types, shadow-check, boundary-fit) always run
to completion. The four expensive filters (maintenance, CVE scan, EU residency,
named-consumer) short-circuit after the first reject. The trace records which
filters ran and which were skipped, with a "skipped because earlier filter
already rejected" sentinel value.

**Pre-commit hook decision-state check** — beyond presence, the script also
verifies that the trace's `decision` matches the dep status: a dep listed in
`package.json` requires `decision: approved`; a trace with `decision: rejected`
that names a dep that's also in the package.json is a hard fail (rejected
libraries cannot ship).

**Conformance system composition** — no new use cases, controllers, manifest
entries, audits, events, or jobs. This PRD is a workflow/policy implementation,
not a feature-domain change. The conformance gates apply only to the new
TypeScript/JS modules (Zod schema module + check script) — they get standard
vitest coverage.

## Testing decisions

- **`scripts/library-decisions/schema.mjs`** — unit tests covering: valid
  trace parses round-trip; missing required field rejected; unknown filter
  rejected; invalid enum value rejected.
- **`scripts/library-decisions/check.mjs`** — integration tests covering: new
  feature-tier dep without trace → fail with exit 1; new feature-tier dep with
  approved trace → pass; new feature-tier dep with rejected trace listed in
  package.json → fail; new app-tier dep (no trace required) → pass; new devdep
  → pass (devdeps exempt); multi-file staged diff with mixed pass/fail → fail
  with per-package report; non-runtime dep (`peerDependencies` only) → pass.
  Use a temp git repo as the test fixture.
- **The skill** — no automated test in the conformance sense (it's a prose
  runbook for an agent). The success criterion is that running it against
  `trpc-to-openapi` produces the documented rejection trace; verified manually
  during the epic.
- **Generator pre-shipped traces** — the existing
  `turbo/generators/__snapshots__/core-package/<name>.snapshot.json` snapshot
  test extends to cover the new trace files. A failing snapshot is the gate.
- **Claude hook scripts** — bash smoke tests that pipe a mocked Claude
  hook payload (`{ "tool_input": { "command": "pnpm add foo" } }`) into the
  script and assert stderr contains the skill-reminder marker. Same style as
  `generator-first-nudge.sh`'s existing tests if any (check during impl).
- **Prior art** — mirror the test patterns from `scripts/work/state-sync-guard.mjs`
  (the pre-commit `_state.json` check) for the new check script; the
  fixture/assertion shape carries over directly.
- **Coverage bands** — the new scripts under `scripts/library-decisions/` are
  not feature packages, so they don't have a `feature.manifest.ts` and aren't
  bound by per-layer coverage thresholds. Add them to the diff-coverage
  exception list **only if** the diff-coverage gate is too strict on script
  files; default is they should hit 100% statement coverage because they're
  small.

## Open questions

- **Q1:** CVE-accepted-risk mechanism — per-trace `accepted-cves: ["CVE-XXXX-YYYY"]`
  frontmatter array vs central `docs/library-decisions/_cve-allowlist.md`? —
  **Recommended:** per-trace. Acceptance is library-scoped, not global; a
  central file becomes a god-object that no agent reads in full.
- **Q2:** Should the policy also gate `peerDependencies` additions, or only
  `dependencies`? — **Recommended:** only `dependencies`. Peer deps are a
  contract, not a runtime addition; if a feature declares a peer, the actual
  runtime adopter (an app or another package) is the one whose `dependencies`
  the policy catches.
- **Q3:** Should the backfill be one commit per trace or one batch commit per
  ADR cluster? — **Recommended:** one commit per cluster (4 commits total),
  each with conventional `chore(deps): backfill library traces for <cluster>`.
  Avoids both extremes (1 mega-commit and 10 noisy single-trace commits).
- **Q4:** Should the skill be permitted to **write the trace** before the
  user/agent approves the final decision, or must the final write be a
  separate explicit step? — **Recommended:** skill writes the trace
  unconditionally at the end of evaluation; the trace IS the record, including
  for rejections. No "draft" state.
- **Q5:** Where do the Claude hooks register themselves? — **Investigate:**
  the existing `.claude/hooks/*.sh` are referenced by some kind of settings
  file or auto-discovered. Confirm during the first slice that adds the new
  hook script and matches the existing registration pattern.

## Out of scope (deferred)

- **Periodic re-verification.** Running each trace's `verification-commands`
  on a schedule (nightly?) to detect drift — new CVEs, license changes,
  upstream abandonment. Deserves its own PRD; would compose with `pnpm fallow`
  as a sixth gate.
- **Auto-generated PR comments** that summarize the trace for human reviewers.
  Nice-to-have once the policy has lived for a quarter.
- **`pnpm libs <subcommand>` ergonomic CLI** — wrapping `check.mjs` as
  `pnpm libs check`, plus `pnpm libs list`, `pnpm libs orphans`, etc. Defer
  until the raw script proves the workflow.

## Further notes

- **Anchored by ADR-022** — Library evaluation policy. Read that first.
- **Glossary entries** for **Library trace** and **Pre-shipped trace** landed
  during the 2026-05-14 grill session that produced ADR-022.
- **Conversation provenance** — the 2026-05-14 grill-with-docs session that
  produced this PRD is captured in the session transcript; ADR-022 cites the
  OpenAPI near-miss as concrete catalyst.