Files
agentic-dev/docs/work/prds/library-evaluation-policy.prd.md
Danijel Martinek bae4b66fa4 refactor(work): drop date prefixes + move _state.json into _system/
Convention shift: epic folders + PRD filenames + frontmatter id
fields are now bare slugs. The created: timestamp (Phase 2) carries
the date; folder names don't repeat it. A future <task-id>-<slug>
shape (e.g. ClickUp) lands cleanly when that integration ships.

Renames (git mv preserves history):
- docs/work/2026-05-13-binder-wrap-helper/
    -> docs/work/binder-wrap-helper/
- docs/work/2026-05-14-library-evaluation-policy/
    -> docs/work/library-evaluation-policy/
- docs/work/2026-05-14-ci-security-and-supply-chain/
    -> docs/work/ci-security-and-supply-chain/
- docs/work/prds/2026-05-13-binder-wrap-helper.prd.md
    -> docs/work/prds/binder-wrap-helper.prd.md
- docs/work/prds/2026-05-13-coverage-architecture.prd.md
    -> docs/work/prds/coverage-architecture.prd.md
- docs/work/prds/2026-05-14-library-evaluation-policy.prd.md
    -> docs/work/prds/library-evaluation-policy.prd.md
- docs/work/prds/2026-05-14-ci-security-and-supply-chain.prd.md
    -> docs/work/prds/ci-security-and-supply-chain.prd.md

Frontmatter updates inside the renamed files: epic id, epic prd,
story epic, PRD id, PRD builds-on all drop date prefixes.

System folder + state file move:
- New docs/work/_system/ holds framework-managed state.
- docs/work/_state.json -> docs/work/_system/_state.json.
- state-builder.mjs adds _system to SKIP_FOLDERS.
- cli.mjs + state-sync-guard.mjs + .husky/pre-commit point at the
  new path.

template-reset-v1 epic deleted entirely (one-off cleanup epic from
the pre-date-convention era; status was already done).

Generator-template updates (so new artifacts ship in the right
shape):
- .sandcastle/decomposer.prompt.md emits bare-slug folder names +
  ISO created: timestamp.
- .claude/skills/to-prd/SKILL.md template uses bare-slug filename +
  bare-slug id field + ISO created: timestamp.

Doc reference updates: glossary, runbook, agent-first-workflow-
and-conformance, reviewer prompt, ADR-020, ADR-022, ADR-023 all
point at the new paths/slugs.
2026-05-14 21:16:51 +02:00

18 KiB

id, title, type, status, author, created, updated, adr
id title type status author created updated adr
library-evaluation-policy Library evaluation policy — skill, traces, enforcement stack prd approved danijel 2026-05-14T00:00:00Z 2026-05-14T19:16:52.691Z adr-022

Problem

This template ships with a deliberately narrow third-party surface — every feature package today holds the same 6 runtime deps and nothing else. That discipline is uncodified. New dependencies enter via pnpm add <pkg> with no checkpoint between intent and lockfile, and three recent signals show the gap:

  1. The 2026-05-14 grill session nearly added trpc-to-openapi + zod-to-json-schema
    • a build-time generator before someone asked "who calls this code path?" The honest answer was "nobody — all callers are TypeScript via createCaller."
  2. ADR-002 (Inversify), ADR-014 (Sentry), ADR-017 (OpenTelemetry) each record library decisions, but every record was written after adoption. No mechanism exists to catch a bad choice before it becomes a migration project.
  3. The repo is EU-resident and GDPR-bound. A library that defaults to a US-only SaaS endpoint silently moves user data out of the EU the moment it's imported with defaults. Nothing currently flags this.

ADR-022 codifies the policy. This PRD implements it.

Goal

A four-layer enforcement stack — Claude hook, skill, pre-commit hook, sandcastle reviewer prompt — that makes every new runtime dependency in a feature- or core-tier package produce a permanent library trace at docs/library-decisions/<YYYY-MM-DD>-<package-name>.md, with rejection traces treated as first-class records.

In scope

  • The evaluate-library skill at .claude/skills/evaluate-library/SKILL.md — authoritative agent runbook; walks 8 hard filters + 3 prompts; writes the trace.
  • The human reading-room guide at docs/guides/adding-a-library.md with worked examples (approved + rejected).
  • The docs/library-decisions/ directory + _template.md schema reference.
  • A Zod-validated trace-schema module (scripts/library-decisions/schema.mjs) shared by the skill, the pre-commit checker, and the generator.
  • Claude PreToolUse hook (.claude/hooks/library-policy-nudge.sh) — matches pnpm add / pnpm i <pkg> in Bash invocations; emits skill reminder.
  • Claude PostToolUse hook (also in library-policy-nudge.sh, dispatching by event type) — matches Edit/Write against any **/package.json.
  • Pre-commit hook check script (scripts/library-decisions/check.mjs) wired into .husky/pre-commit. Blocks the commit when a new runtime dep is staged in a feature/core package and no sibling trace file is staged.
  • Sandcastle reviewer prompt update (.sandcastle/reviewer.prompt.md) — the reviewer agent runs the same check before issuing approve/reject.
  • Optional-cores generator templates (turbo/generators/templates/core-package/) emit pre-shipped traces per direct dep, dated at generation time, marked decision: approved, citing the relevant ADR (015/016/018). Five generators updated: events, realtime, audit, trpc, ui.
  • Backfill traces for every existing runtime dependency in feature- and core- tier packages, dated 2026-05-14, citing existing ADRs (002/014/017) where applicable. Approx 10 traces.
  • CLAUDE.md "Key Conventions" gets a one-line bullet pointing to ADR-022 + the guide.

Out of scope

  • Transitive dependency tracing — pnpm audit already handles recursive scanning.
  • Bundle-size analysis — Vercel / Vite build output already reports this.
  • Auto-removal of approved-then-unused deps — pnpm fallow territory.
  • License auto-enforcement at the lockfile layer (license-checker plugins) — defer until the policy has run for some time and we know where it leaks.
  • Anything app-tier. Deps in apps/* are author's call per the tier model.
  • Devdeps in any tier. Only dependencies (runtime) require traces.

Constraints

  • ADR-022 is the source of truth. This PRD implements but does not extend it.
  • ADR-006 + ADR-010 — the tier trigger maps onto the existing boundary-tag system. No new mental model; ESLint already partitions blast radius.
  • ADR-019 — the sandcastle reviewer prompt is one of four enforcement layers. Whatever the agent loop does must compose with the existing prompt shape at .sandcastle/reviewer.prompt.md.
  • ADR-021 — release-please picks up dependency changes from commit history. The trace file landing in the same commit as the package.json change is required so release notes correlate cleanly with policy records.
  • Conformance system parity — the enforcement stack mirrors the 5-gate latency pattern from ADR-012. Same vocabulary, same agent feedback loop.
  • Conventional Commits — every commit produced by the implementation follows <type>(<scope>): <subject>.
  • --no-verify is forbidden — the bash-guard hook already enforces this; the new pre-commit check inherits that protection.
  • Skill must be deterministic from explicit args/evaluate-library <name> --tier <feature|core|app> --target <package-path>. The Claude hook produces exactly this invocation from a pnpm add command line.

Success criteria

  • pnpm typecheck && pnpm test && pnpm lint && pnpm conformance && pnpm fallow:audit pass green at the end of the epic.
  • pnpm coverage:diff covers every changed executable line introduced by the implementation slices.
  • Attempting to commit a new feature-tier runtime dep without a sibling trace file is blocked by the pre-commit hook with a clear error pointing to the skill.
  • Running the evaluate-library skill against trpc-to-openapi (the rejected library from the grill session) produces a decision: rejected trace with named-consumer: fail and prose citing the conversation, in <90 seconds of agent work.
  • Running pnpm turbo gen core-package events (or any other optional core) emits pre-shipped traces for every direct dep of that core into docs/library-decisions/, all decision: approved and ADR-cited.
  • The Claude PreToolUse hook fires on pnpm add <anything> and emits the skill-reminder system-reminder; does not auto-deny.
  • All existing runtime deps in feature- and core-tier packages have backfilled trace files dated 2026-05-14 in docs/library-decisions/.
  • CLAUDE.md Key Conventions includes the one-line policy bullet.
  • docs/glossary.md already includes Library trace and Pre-shipped trace entries (landed inline during the grill session).

User stories

  1. As a developer adding a new feature dependency, I want a deterministic skill that walks me through the 8 filters and 3 prompts in collect-cheap- skip-expensive order, so I don't forget any check and the trace file is written automatically with my answers.
  2. As an agent dispatched against a slice that needs a new dep, I want the Claude PreToolUse hook to inject a system-reminder pointing me at the skill the moment I'm about to run pnpm add, so I don't bypass the policy by reflex.
  3. As an agent editing a package.json by hand, I want the Claude PostToolUse hook to inject the same reminder, so the policy isn't sidestepped by paste-then-install.
  4. As a reviewer (human or agent), I want the pre-commit hook to refuse a commit that adds a runtime dep to a feature/core package without a sibling trace file, so I don't have to remember to check during review.
  5. As a future agent considering a previously-rejected library, I want to find the rejection trace in docs/library-decisions/ in <1s of ls/grep, so I don't re-litigate a decision that has already been made.
  6. As an EU-resident maintainer, I want the EU-data-residency filter to reject US-only SaaS components by default and force a self-hostable or EU-region-configured justification in the trace, so user data doesn't leave the EU silently.
  7. As a maintainer scaffolding an optional core via pnpm turbo gen core-package <name>, I want pre-shipped traces emitted for every direct dep of the new core, so the policy is satisfied by construction.
  8. As a security-conscious maintainer, I want the CVE-scan filter to run pnpm audit --audit-level=moderate at evaluation time and snapshot the result + commands into the trace, so I can re-run them later to detect drift.
  9. As an agent reviewing a slice in sandcastle, I want the reviewer prompt to check for trace presence + correctness, so I can reject incompliant slices without needing a separate workflow.
  10. As a maintainer reading the repo for the first time, I want docs/guides/adding-a-library.md to explain the policy with worked examples (one approved, one rejected), so I understand the why and how before I face the gate myself.

Implementation decisions

Module sketch — what lands where, by package and concern (no file paths where prose suffices):

  • The skill itself.claude/skills/evaluate-library/ follows the same shape as to-prd, grill-with-docs, improve-codebase-architecture. SKILL.md is authoritative; supporting files (POLICY.md mirroring ADR-022, TRACE-TEMPLATE.md showing the YAML+headings shape, EXAMPLES/ worked cases) flesh it out. The skill is invocable via slash command /evaluate-library.
  • Trace schema module — a small shared module at scripts/library-decisions/schema.mjs exporting (1) a Zod schema for the trace's frontmatter, (2) a parser that reads a trace file and returns the validated frontmatter, (3) a serializer that takes filter results + prose blocks and emits a trace file. Both the skill and the pre-commit checker import this module. Deep module — small interface (parse/serialize/validate), high leverage across the four enforcement layers.
  • Pre-commit check scriptscripts/library-decisions/check.mjs. Walks git diff --cached --name-only -- '**/package.json', for each file extracts newly-added dep lines via git diff --cached <file>, derives the tier from the path, and for each new runtime dep checks that docs/library-decisions/*-<name>.md is also staged with decision: approved. Exits non-zero with a pointer to the skill if any check fails. Invoked from .husky/pre-commit as step 4 (after the existing state-sync guard).
  • Claude hooks — a single .claude/hooks/library-policy-nudge.sh that dispatches on tool_use_type to handle both PreToolUse (Bash with pnpm add / pnpm i <pkg> pattern) and PostToolUse (Edit/Write on **/package.json). Same style as the existing generator-first-nudge.sh. Emits a non-blocking system-reminder to stdout that the harness threads into the agent's next turn.
  • Sandcastle reviewer prompt — append a "Library-trace check" section to .sandcastle/reviewer.prompt.md. The reviewer runs node scripts/library-decisions/check.mjs --staged-against <base> in the sandbox before issuing its verdict.
  • Generator templates — each turbo/generators/templates/core-package/<name>/ gets a docs/library-decisions/ subtree with one .md per direct dep of that core. The generator copies these alongside the core's package.json into the workspace. Frozen via the existing turbo/generators/__snapshots__/core-package/<name>.snapshot.json mechanism.
  • Backfill traces — write one trace per existing runtime dep in feature/ core tier. The deps cluster naturally by ADR provenance: ADR-002 cluster (Inversify + reflect-metadata), ADR-014 cluster (Sentry family), ADR-017 cluster (OpenTelemetry family), and the un-cited cluster (payload, @trpc/server, zod, superjson, plus any others surfaced by inventory). All traces dated 2026-05-14, decision: approved, ADR citation where the cluster maps to one.
  • CLAUDE.md update — one bullet in Key Conventions: "New runtime dependencies in feature- or core-tier packages require a trace at docs/library-decisions/<date>-<name>.md produced by the evaluate-library skill — see ADR-022."

Trace schema (frontmatter) — Zod schema (lifted from ADR-022 §4):

package: string
version: string                  // semver range as written in package.json
tier: "app" | "feature" | "core"
decision: "approved" | "rejected"
date: ISO date string
deciders: string[]
adr: string | null               // "adr-NNN" slug or null
filter-results: {
  license: SPDX-id-string
  types: "native" | `@types/${string}` | "none"
  maintenance: "active" | "dormant" | "abandoned"
  boundary-fit: "pass" | "fail"
  shadow-check: "pass" | "fail" | `shadows ${string}`
  eu-residency: "ok" | "n/a" | "self-hostable" | "fail"
  cve-scan: "clean" | `${cve-id}` | "fail"
  named-consumer: "pass" | "fail"
}
verification-commands: string[]
accepted-cves?: string[]         // optional per-trace allowlist

Headings (machine-checkable order): one ## Filter: <name> per filter + one ## Prompt: <name> per prompt, in the order listed in ADR-022.

Skill fail behavior — collect-cheap-skip-expensive. The four cheap structural filters (license, types, shadow-check, boundary-fit) always run to completion. The four expensive filters (maintenance, CVE scan, EU residency, named-consumer) short-circuit after the first reject. The trace records which filters ran and which were skipped, with a "skipped because earlier filter already rejected" sentinel value.

Pre-commit hook decision-state check — beyond presence, the script also verifies that the trace's decision matches the dep status: a dep listed in package.json requires decision: approved; a trace with decision: rejected that names a dep that's also in the package.json is a hard fail (rejected libraries cannot ship).

Conformance system composition — no new use cases, controllers, manifest entries, audits, events, or jobs. This PRD is a workflow/policy implementation, not a feature-domain change. The conformance gates apply only to the new TypeScript/JS modules (Zod schema module + check script) — they get standard vitest coverage.

Testing decisions

  • scripts/library-decisions/schema.mjs — unit tests covering: valid trace parses round-trip; missing required field rejected; unknown filter rejected; invalid enum value rejected.
  • scripts/library-decisions/check.mjs — integration tests covering: new feature-tier dep without trace → fail with exit 1; new feature-tier dep with approved trace → pass; new feature-tier dep with rejected trace listed in package.json → fail; new app-tier dep (no trace required) → pass; new devdep → pass (devdeps exempt); multi-file staged diff with mixed pass/fail → fail with per-package report; non-runtime dep (peerDependencies only) → pass. Use a temp git repo as the test fixture.
  • The skill — no automated test in the conformance sense (it's a prose runbook for an agent). The success criterion is that running it against trpc-to-openapi produces the documented rejection trace; verified manually during the epic.
  • Generator pre-shipped traces — the existing turbo/generators/__snapshots__/core-package/<name>.snapshot.json snapshot test extends to cover the new trace files. A failing snapshot is the gate.
  • Claude hook scripts — bash smoke tests that pipe a mocked Claude hook payload ({ "tool_input": { "command": "pnpm add foo" } }) into the script and assert stderr contains the skill-reminder marker. Same style as generator-first-nudge.sh's existing tests if any (check during impl).
  • Prior art — mirror the test patterns from scripts/work/state-sync-guard.mjs (the pre-commit _state.json check) for the new check script; the fixture/assertion shape carries over directly.
  • Coverage bands — the new scripts under scripts/library-decisions/ are not feature packages, so they don't have a feature.manifest.ts and aren't bound by per-layer coverage thresholds. Add them to the diff-coverage exception list only if the diff-coverage gate is too strict on script files; default is they should hit 100% statement coverage because they're small.

Open questions

  • Q1: CVE-accepted-risk mechanism — per-trace accepted-cves: ["CVE-XXXX-YYYY"] frontmatter array vs central docs/library-decisions/_cve-allowlist.md? — Recommended: per-trace. Acceptance is library-scoped, not global; a central file becomes a god-object that no agent reads in full.
  • Q2: Should the policy also gate peerDependencies additions, or only dependencies? — Recommended: only dependencies. Peer deps are a contract, not a runtime addition; if a feature declares a peer, the actual runtime adopter (an app or another package) is the one whose dependencies the policy catches.
  • Q3: Should the backfill be one commit per trace or one batch commit per ADR cluster? — Recommended: one commit per cluster (4 commits total), each with conventional chore(deps): backfill library traces for <cluster>. Avoids both extremes (1 mega-commit and 10 noisy single-trace commits).
  • Q4: Should the skill be permitted to write the trace before the user/agent approves the final decision, or must the final write be a separate explicit step? — Recommended: skill writes the trace unconditionally at the end of evaluation; the trace IS the record, including for rejections. No "draft" state.
  • Q5: Where do the Claude hooks register themselves? — Investigate: the existing .claude/hooks/*.sh are referenced by some kind of settings file or auto-discovered. Confirm during the first slice that adds the new hook script and matches the existing registration pattern.

Out of scope (deferred)

  • Periodic re-verification. Running each trace's verification-commands on a schedule (nightly?) to detect drift — new CVEs, license changes, upstream abandonment. Deserves its own PRD; would compose with pnpm fallow as a sixth gate.
  • Auto-generated PR comments that summarize the trace for human reviewers. Nice-to-have once the policy has lived for a quarter.
  • pnpm libs <subcommand> ergonomic CLI — wrapping check.mjs as pnpm libs check, plus pnpm libs list, pnpm libs orphans, etc. Defer until the raw script proves the workflow.

Further notes

  • Anchored by ADR-022 — Library evaluation policy. Read that first.
  • Glossary entries for Library trace and Pre-shipped trace landed during the 2026-05-14 grill session that produced ADR-022.
  • Conversation provenance — the 2026-05-14 grill-with-docs session that produced this PRD is captured in the session transcript; ADR-022 cites the OpenAPI near-miss as concrete catalyst.