agentic-dev/library-evaluation-policy.prd.md at bae4b66fa4798a15b0b4a2aa49b3b00bfd00a4fb

Template

Files

Danijel Martinek bae4b66fa4 refactor(work): drop date prefixes + move _state.json into _system/

Convention shift: epic folders + PRD filenames + frontmatter id
fields are now bare slugs. The created: timestamp (Phase 2) carries
the date; folder names don't repeat it. A future <task-id>-<slug>
shape (e.g. ClickUp) lands cleanly when that integration ships.

Renames (git mv preserves history):
- docs/work/2026-05-13-binder-wrap-helper/
    -> docs/work/binder-wrap-helper/
- docs/work/2026-05-14-library-evaluation-policy/
    -> docs/work/library-evaluation-policy/
- docs/work/2026-05-14-ci-security-and-supply-chain/
    -> docs/work/ci-security-and-supply-chain/
- docs/work/prds/2026-05-13-binder-wrap-helper.prd.md
    -> docs/work/prds/binder-wrap-helper.prd.md
- docs/work/prds/2026-05-13-coverage-architecture.prd.md
    -> docs/work/prds/coverage-architecture.prd.md
- docs/work/prds/2026-05-14-library-evaluation-policy.prd.md
    -> docs/work/prds/library-evaluation-policy.prd.md
- docs/work/prds/2026-05-14-ci-security-and-supply-chain.prd.md
    -> docs/work/prds/ci-security-and-supply-chain.prd.md

Frontmatter updates inside the renamed files: epic id, epic prd,
story epic, PRD id, PRD builds-on all drop date prefixes.

System folder + state file move:
- New docs/work/_system/ holds framework-managed state.
- docs/work/_state.json -> docs/work/_system/_state.json.
- state-builder.mjs adds _system to SKIP_FOLDERS.
- cli.mjs + state-sync-guard.mjs + .husky/pre-commit point at the
  new path.

template-reset-v1 epic deleted entirely (one-off cleanup epic from
the pre-date-convention era; status was already done).

Generator-template updates (so new artifacts ship in the right
shape):
- .sandcastle/decomposer.prompt.md emits bare-slug folder names +
  ISO created: timestamp.
- .claude/skills/to-prd/SKILL.md template uses bare-slug filename +
  bare-slug id field + ISO created: timestamp.

Doc reference updates: glossary, runbook, agent-first-workflow-
and-conformance, reviewer prompt, ADR-020, ADR-022, ADR-023 all
point at the new paths/slugs.

2026-05-14 21:16:51 +02:00

18 KiB

Raw Blame History

id, title, type, status, author, created, updated, adr

id	title	type	status	author	created	updated	adr
library-evaluation-policy	Library evaluation policy — skill, traces, enforcement stack	prd	approved	danijel	2026-05-14T00:00:00Z	2026-05-14T19:16:52.691Z	adr-022

Problem

This template ships with a deliberately narrow third-party surface — every feature package today holds the same 6 runtime deps and nothing else. That discipline is uncodified. New dependencies enter via pnpm add <pkg> with no checkpoint between intent and lockfile, and three recent signals show the gap:

The 2026-05-14 grill session nearly added trpc-to-openapi + zod-to-json-schema
- a build-time generator before someone asked "who calls this code path?" The honest answer was "nobody — all callers are TypeScript via createCaller."
ADR-002 (Inversify), ADR-014 (Sentry), ADR-017 (OpenTelemetry) each record library decisions, but every record was written after adoption. No mechanism exists to catch a bad choice before it becomes a migration project.
The repo is EU-resident and GDPR-bound. A library that defaults to a US-only SaaS endpoint silently moves user data out of the EU the moment it's imported with defaults. Nothing currently flags this.

ADR-022 codifies the policy. This PRD implements it.

Goal

A four-layer enforcement stack — Claude hook, skill, pre-commit hook, sandcastle reviewer prompt — that makes every new runtime dependency in a feature- or core-tier package produce a permanent library trace at docs/library-decisions/<YYYY-MM-DD>-<package-name>.md, with rejection traces treated as first-class records.

In scope

The evaluate-library skill at .claude/skills/evaluate-library/SKILL.md — authoritative agent runbook; walks 8 hard filters + 3 prompts; writes the trace.
The human reading-room guide at docs/guides/adding-a-library.md with worked examples (approved + rejected).
The docs/library-decisions/ directory + _template.md schema reference.
A Zod-validated trace-schema module (scripts/library-decisions/schema.mjs) shared by the skill, the pre-commit checker, and the generator.
Claude PreToolUse hook (.claude/hooks/library-policy-nudge.sh) — matches pnpm add / pnpm i <pkg> in Bash invocations; emits skill reminder.
Claude PostToolUse hook (also in library-policy-nudge.sh, dispatching by event type) — matches Edit/Write against any **/package.json.
Pre-commit hook check script (scripts/library-decisions/check.mjs) wired into .husky/pre-commit. Blocks the commit when a new runtime dep is staged in a feature/core package and no sibling trace file is staged.
Sandcastle reviewer prompt update (.sandcastle/reviewer.prompt.md) — the reviewer agent runs the same check before issuing approve/reject.
Optional-cores generator templates (turbo/generators/templates/core-package/) emit pre-shipped traces per direct dep, dated at generation time, marked decision: approved, citing the relevant ADR (015/016/018). Five generators updated: events, realtime, audit, trpc, ui.
Backfill traces for every existing runtime dependency in feature- and core- tier packages, dated 2026-05-14, citing existing ADRs (002/014/017) where applicable. Approx 10 traces.
CLAUDE.md "Key Conventions" gets a one-line bullet pointing to ADR-022 + the guide.

Out of scope

Transitive dependency tracing — pnpm audit already handles recursive scanning.
Bundle-size analysis — Vercel / Vite build output already reports this.
Auto-removal of approved-then-unused deps — pnpm fallow territory.
License auto-enforcement at the lockfile layer (license-checker plugins) — defer until the policy has run for some time and we know where it leaks.
Anything app-tier. Deps in apps/* are author's call per the tier model.
Devdeps in any tier. Only dependencies (runtime) require traces.

Constraints

ADR-022 is the source of truth. This PRD implements but does not extend it.
ADR-006 + ADR-010 — the tier trigger maps onto the existing boundary-tag system. No new mental model; ESLint already partitions blast radius.
ADR-019 — the sandcastle reviewer prompt is one of four enforcement layers. Whatever the agent loop does must compose with the existing prompt shape at .sandcastle/reviewer.prompt.md.
ADR-021 — release-please picks up dependency changes from commit history. The trace file landing in the same commit as the package.json change is required so release notes correlate cleanly with policy records.
Conformance system parity — the enforcement stack mirrors the 5-gate latency pattern from ADR-012. Same vocabulary, same agent feedback loop.
Conventional Commits — every commit produced by the implementation follows <type>(<scope>): <subject>.
--no-verify is forbidden — the bash-guard hook already enforces this; the new pre-commit check inherits that protection.
Skill must be deterministic from explicit args — /evaluate-library <name> --tier <feature|core|app> --target <package-path>. The Claude hook produces exactly this invocation from a pnpm add command line.

Success criteria

pnpm typecheck && pnpm test && pnpm lint && pnpm conformance && pnpm fallow:audit pass green at the end of the epic.
pnpm coverage:diff covers every changed executable line introduced by the implementation slices.
Attempting to commit a new feature-tier runtime dep without a sibling trace file is blocked by the pre-commit hook with a clear error pointing to the skill.
Running the evaluate-library skill against trpc-to-openapi (the rejected library from the grill session) produces a decision: rejected trace with named-consumer: fail and prose citing the conversation, in <90 seconds of agent work.
Running pnpm turbo gen core-package events (or any other optional core) emits pre-shipped traces for every direct dep of that core into docs/library-decisions/, all decision: approved and ADR-cited.
The Claude PreToolUse hook fires on pnpm add <anything> and emits the skill-reminder system-reminder; does not auto-deny.
All existing runtime deps in feature- and core-tier packages have backfilled trace files dated 2026-05-14 in docs/library-decisions/.
CLAUDE.md Key Conventions includes the one-line policy bullet.
docs/glossary.md already includes Library trace and Pre-shipped trace entries (landed inline during the grill session).

User stories

As a developer adding a new feature dependency, I want a deterministic skill that walks me through the 8 filters and 3 prompts in collect-cheap- skip-expensive order, so I don't forget any check and the trace file is written automatically with my answers.
As an agent dispatched against a slice that needs a new dep, I want the Claude PreToolUse hook to inject a system-reminder pointing me at the skill the moment I'm about to run pnpm add, so I don't bypass the policy by reflex.
As an agent editing a package.json by hand, I want the Claude PostToolUse hook to inject the same reminder, so the policy isn't sidestepped by paste-then-install.
As a reviewer (human or agent), I want the pre-commit hook to refuse a commit that adds a runtime dep to a feature/core package without a sibling trace file, so I don't have to remember to check during review.
As a future agent considering a previously-rejected library, I want to find the rejection trace in docs/library-decisions/ in <1s of ls/grep, so I don't re-litigate a decision that has already been made.
As an EU-resident maintainer, I want the EU-data-residency filter to reject US-only SaaS components by default and force a self-hostable or EU-region-configured justification in the trace, so user data doesn't leave the EU silently.
As a maintainer scaffolding an optional core via pnpm turbo gen core-package <name>, I want pre-shipped traces emitted for every direct dep of the new core, so the policy is satisfied by construction.
As a security-conscious maintainer, I want the CVE-scan filter to run pnpm audit --audit-level=moderate at evaluation time and snapshot the result + commands into the trace, so I can re-run them later to detect drift.
As an agent reviewing a slice in sandcastle, I want the reviewer prompt to check for trace presence + correctness, so I can reject incompliant slices without needing a separate workflow.
As a maintainer reading the repo for the first time, I want docs/guides/adding-a-library.md to explain the policy with worked examples (one approved, one rejected), so I understand the why and how before I face the gate myself.

Implementation decisions

Module sketch — what lands where, by package and concern (no file paths where prose suffices):

The skill itself — .claude/skills/evaluate-library/ follows the same shape as to-prd, grill-with-docs, improve-codebase-architecture. SKILL.md is authoritative; supporting files (POLICY.md mirroring ADR-022, TRACE-TEMPLATE.md showing the YAML+headings shape, EXAMPLES/ worked cases) flesh it out. The skill is invocable via slash command /evaluate-library.
Trace schema module — a small shared module at scripts/library-decisions/schema.mjs exporting (1) a Zod schema for the trace's frontmatter, (2) a parser that reads a trace file and returns the validated frontmatter, (3) a serializer that takes filter results + prose blocks and emits a trace file. Both the skill and the pre-commit checker import this module. Deep module — small interface (parse/serialize/validate), high leverage across the four enforcement layers.
Pre-commit check script — scripts/library-decisions/check.mjs. Walks git diff --cached --name-only -- '**/package.json', for each file extracts newly-added dep lines via git diff --cached <file>, derives the tier from the path, and for each new runtime dep checks that docs/library-decisions/*-<name>.md is also staged with decision: approved. Exits non-zero with a pointer to the skill if any check fails. Invoked from .husky/pre-commit as step 4 (after the existing state-sync guard).
Claude hooks — a single .claude/hooks/library-policy-nudge.sh that dispatches on tool_use_type to handle both PreToolUse (Bash with pnpm add / pnpm i <pkg> pattern) and PostToolUse (Edit/Write on **/package.json). Same style as the existing generator-first-nudge.sh. Emits a non-blocking system-reminder to stdout that the harness threads into the agent's next turn.
Sandcastle reviewer prompt — append a "Library-trace check" section to .sandcastle/reviewer.prompt.md. The reviewer runs node scripts/library-decisions/check.mjs --staged-against <base> in the sandbox before issuing its verdict.
Generator templates — each turbo/generators/templates/core-package/<name>/ gets a docs/library-decisions/ subtree with one .md per direct dep of that core. The generator copies these alongside the core's package.json into the workspace. Frozen via the existing turbo/generators/__snapshots__/core-package/<name>.snapshot.json mechanism.
Backfill traces — write one trace per existing runtime dep in feature/ core tier. The deps cluster naturally by ADR provenance: ADR-002 cluster (Inversify + reflect-metadata), ADR-014 cluster (Sentry family), ADR-017 cluster (OpenTelemetry family), and the un-cited cluster (payload, @trpc/server, zod, superjson, plus any others surfaced by inventory). All traces dated 2026-05-14, decision: approved, ADR citation where the cluster maps to one.
CLAUDE.md update — one bullet in Key Conventions: "New runtime dependencies in feature- or core-tier packages require a trace at docs/library-decisions/<date>-<name>.md produced by the evaluate-library skill — see ADR-022."

Trace schema (frontmatter) — Zod schema (lifted from ADR-022 §4):

package: string
version: string                  // semver range as written in package.json
tier: "app" | "feature" | "core"
decision: "approved" | "rejected"
date: ISO date string
deciders: string[]
adr: string | null               // "adr-NNN" slug or null
filter-results: {
  license: SPDX-id-string
  types: "native" | `@types/${string}` | "none"
  maintenance: "active" | "dormant" | "abandoned"
  boundary-fit: "pass" | "fail"
  shadow-check: "pass" | "fail" | `shadows ${string}`
  eu-residency: "ok" | "n/a" | "self-hostable" | "fail"
  cve-scan: "clean" | `${cve-id}` | "fail"
  named-consumer: "pass" | "fail"
}
verification-commands: string[]
accepted-cves?: string[]         // optional per-trace allowlist

Headings (machine-checkable order): one ## Filter: <name> per filter + one ## Prompt: <name> per prompt, in the order listed in ADR-022.

Skill fail behavior — collect-cheap-skip-expensive. The four cheap structural filters (license, types, shadow-check, boundary-fit) always run to completion. The four expensive filters (maintenance, CVE scan, EU residency, named-consumer) short-circuit after the first reject. The trace records which filters ran and which were skipped, with a "skipped because earlier filter already rejected" sentinel value.

Pre-commit hook decision-state check — beyond presence, the script also verifies that the trace's decision matches the dep status: a dep listed in package.json requires decision: approved; a trace with decision: rejected that names a dep that's also in the package.json is a hard fail (rejected libraries cannot ship).

Conformance system composition — no new use cases, controllers, manifest entries, audits, events, or jobs. This PRD is a workflow/policy implementation, not a feature-domain change. The conformance gates apply only to the new TypeScript/JS modules (Zod schema module + check script) — they get standard vitest coverage.

Testing decisions

scripts/library-decisions/schema.mjs — unit tests covering: valid trace parses round-trip; missing required field rejected; unknown filter rejected; invalid enum value rejected.
scripts/library-decisions/check.mjs — integration tests covering: new feature-tier dep without trace → fail with exit 1; new feature-tier dep with approved trace → pass; new feature-tier dep with rejected trace listed in package.json → fail; new app-tier dep (no trace required) → pass; new devdep → pass (devdeps exempt); multi-file staged diff with mixed pass/fail → fail with per-package report; non-runtime dep (peerDependencies only) → pass. Use a temp git repo as the test fixture.
The skill — no automated test in the conformance sense (it's a prose runbook for an agent). The success criterion is that running it against trpc-to-openapi produces the documented rejection trace; verified manually during the epic.
Generator pre-shipped traces — the existing turbo/generators/__snapshots__/core-package/<name>.snapshot.json snapshot test extends to cover the new trace files. A failing snapshot is the gate.
Claude hook scripts — bash smoke tests that pipe a mocked Claude hook payload ({ "tool_input": { "command": "pnpm add foo" } }) into the script and assert stderr contains the skill-reminder marker. Same style as generator-first-nudge.sh's existing tests if any (check during impl).
Prior art — mirror the test patterns from scripts/work/state-sync-guard.mjs (the pre-commit _state.json check) for the new check script; the fixture/assertion shape carries over directly.
Coverage bands — the new scripts under scripts/library-decisions/ are not feature packages, so they don't have a feature.manifest.ts and aren't bound by per-layer coverage thresholds. Add them to the diff-coverage exception list only if the diff-coverage gate is too strict on script files; default is they should hit 100% statement coverage because they're small.

Open questions

Q1: CVE-accepted-risk mechanism — per-trace accepted-cves: ["CVE-XXXX-YYYY"] frontmatter array vs central docs/library-decisions/_cve-allowlist.md? — Recommended: per-trace. Acceptance is library-scoped, not global; a central file becomes a god-object that no agent reads in full.
Q2: Should the policy also gate peerDependencies additions, or only dependencies? — Recommended: only dependencies. Peer deps are a contract, not a runtime addition; if a feature declares a peer, the actual runtime adopter (an app or another package) is the one whose dependencies the policy catches.
Q3: Should the backfill be one commit per trace or one batch commit per ADR cluster? — Recommended: one commit per cluster (4 commits total), each with conventional chore(deps): backfill library traces for <cluster>. Avoids both extremes (1 mega-commit and 10 noisy single-trace commits).
Q4: Should the skill be permitted to write the trace before the user/agent approves the final decision, or must the final write be a separate explicit step? — Recommended: skill writes the trace unconditionally at the end of evaluation; the trace IS the record, including for rejections. No "draft" state.
Q5: Where do the Claude hooks register themselves? — Investigate: the existing .claude/hooks/*.sh are referenced by some kind of settings file or auto-discovered. Confirm during the first slice that adds the new hook script and matches the existing registration pattern.

Out of scope (deferred)

Periodic re-verification. Running each trace's verification-commands on a schedule (nightly?) to detect drift — new CVEs, license changes, upstream abandonment. Deserves its own PRD; would compose with pnpm fallow as a sixth gate.
Auto-generated PR comments that summarize the trace for human reviewers. Nice-to-have once the policy has lived for a quarter.
pnpm libs <subcommand> ergonomic CLI — wrapping check.mjs as pnpm libs check, plus pnpm libs list, pnpm libs orphans, etc. Defer until the raw script proves the workflow.

Further notes

Anchored by ADR-022 — Library evaluation policy. Read that first.
Glossary entries for Library trace and Pre-shipped trace landed during the 2026-05-14 grill session that produced ADR-022.
Conversation provenance — the 2026-05-14 grill-with-docs session that produced this PRD is captured in the session transcript; ADR-022 cites the OpenAPI near-miss as concrete catalyst.

18 KiB Raw Blame History