Convention shift: epic folders + PRD filenames + frontmatter id
fields are now bare slugs. The created: timestamp (Phase 2) carries
the date; folder names don't repeat it. A future <task-id>-<slug>
shape (e.g. ClickUp) lands cleanly when that integration ships.
Renames (git mv preserves history):
- docs/work/2026-05-13-binder-wrap-helper/
-> docs/work/binder-wrap-helper/
- docs/work/2026-05-14-library-evaluation-policy/
-> docs/work/library-evaluation-policy/
- docs/work/2026-05-14-ci-security-and-supply-chain/
-> docs/work/ci-security-and-supply-chain/
- docs/work/prds/2026-05-13-binder-wrap-helper.prd.md
-> docs/work/prds/binder-wrap-helper.prd.md
- docs/work/prds/2026-05-13-coverage-architecture.prd.md
-> docs/work/prds/coverage-architecture.prd.md
- docs/work/prds/2026-05-14-library-evaluation-policy.prd.md
-> docs/work/prds/library-evaluation-policy.prd.md
- docs/work/prds/2026-05-14-ci-security-and-supply-chain.prd.md
-> docs/work/prds/ci-security-and-supply-chain.prd.md
Frontmatter updates inside the renamed files: epic id, epic prd,
story epic, PRD id, PRD builds-on all drop date prefixes.
System folder + state file move:
- New docs/work/_system/ holds framework-managed state.
- docs/work/_state.json -> docs/work/_system/_state.json.
- state-builder.mjs adds _system to SKIP_FOLDERS.
- cli.mjs + state-sync-guard.mjs + .husky/pre-commit point at the
new path.
template-reset-v1 epic deleted entirely (one-off cleanup epic from
the pre-date-convention era; status was already done).
Generator-template updates (so new artifacts ship in the right
shape):
- .sandcastle/decomposer.prompt.md emits bare-slug folder names +
ISO created: timestamp.
- .claude/skills/to-prd/SKILL.md template uses bare-slug filename +
bare-slug id field + ISO created: timestamp.
Doc reference updates: glossary, runbook, agent-first-workflow-
and-conformance, reviewer prompt, ADR-020, ADR-022, ADR-023 all
point at the new paths/slugs.
335 lines
18 KiB
Markdown
335 lines
18 KiB
Markdown
---
|
|
id: library-evaluation-policy
|
|
title: Library evaluation policy — skill, traces, enforcement stack
|
|
type: prd
|
|
status: approved
|
|
author: danijel
|
|
created: 2026-05-14T00:00:00Z
|
|
updated: 2026-05-14T19:16:52.691Z
|
|
adr: adr-022
|
|
---
|
|
|
|
## Problem
|
|
|
|
This template ships with a deliberately narrow third-party surface — every
|
|
feature package today holds the same 6 runtime deps and nothing else. That
|
|
discipline is uncodified. New dependencies enter via `pnpm add <pkg>` with no
|
|
checkpoint between intent and lockfile, and three recent signals show the gap:
|
|
|
|
1. The 2026-05-14 grill session nearly added `trpc-to-openapi` + `zod-to-json-schema`
|
|
- a build-time generator before someone asked "who calls this code path?"
|
|
The honest answer was "nobody — all callers are TypeScript via `createCaller`."
|
|
2. ADR-002 (Inversify), ADR-014 (Sentry), ADR-017 (OpenTelemetry) each record
|
|
library decisions, but every record was written _after_ adoption. No mechanism
|
|
exists to catch a bad choice before it becomes a migration project.
|
|
3. The repo is EU-resident and GDPR-bound. A library that defaults to a US-only
|
|
SaaS endpoint silently moves user data out of the EU the moment it's
|
|
imported with defaults. Nothing currently flags this.
|
|
|
|
ADR-022 codifies the policy. This PRD implements it.
|
|
|
|
## Goal
|
|
|
|
A four-layer enforcement stack — Claude hook, skill, pre-commit hook, sandcastle
|
|
reviewer prompt — that makes every new runtime dependency in a feature- or
|
|
core-tier package produce a permanent **library trace** at
|
|
`docs/library-decisions/<YYYY-MM-DD>-<package-name>.md`, with rejection traces
|
|
treated as first-class records.
|
|
|
|
## In scope
|
|
|
|
- The `evaluate-library` skill at `.claude/skills/evaluate-library/SKILL.md` —
|
|
authoritative agent runbook; walks 8 hard filters + 3 prompts; writes the trace.
|
|
- The human reading-room guide at `docs/guides/adding-a-library.md` with worked
|
|
examples (approved + rejected).
|
|
- The `docs/library-decisions/` directory + `_template.md` schema reference.
|
|
- A Zod-validated trace-schema module (`scripts/library-decisions/schema.mjs`)
|
|
shared by the skill, the pre-commit checker, and the generator.
|
|
- Claude `PreToolUse` hook (`.claude/hooks/library-policy-nudge.sh`) — matches
|
|
`pnpm add` / `pnpm i <pkg>` in Bash invocations; emits skill reminder.
|
|
- Claude `PostToolUse` hook (also in `library-policy-nudge.sh`, dispatching by
|
|
event type) — matches `Edit`/`Write` against any `**/package.json`.
|
|
- Pre-commit hook check script (`scripts/library-decisions/check.mjs`) wired
|
|
into `.husky/pre-commit`. Blocks the commit when a new runtime dep is staged
|
|
in a feature/core package and no sibling trace file is staged.
|
|
- Sandcastle reviewer prompt update (`.sandcastle/reviewer.prompt.md`) — the
|
|
reviewer agent runs the same check before issuing approve/reject.
|
|
- Optional-cores generator templates (`turbo/generators/templates/core-package/`)
|
|
emit pre-shipped traces per direct dep, dated at generation time, marked
|
|
`decision: approved`, citing the relevant ADR (015/016/018). Five generators
|
|
updated: `events`, `realtime`, `audit`, `trpc`, `ui`.
|
|
- Backfill traces for every existing runtime dependency in feature- and core-
|
|
tier packages, dated 2026-05-14, citing existing ADRs (002/014/017) where
|
|
applicable. Approx 10 traces.
|
|
- `CLAUDE.md` "Key Conventions" gets a one-line bullet pointing to ADR-022 +
|
|
the guide.
|
|
|
|
## Out of scope
|
|
|
|
- Transitive dependency tracing — `pnpm audit` already handles recursive scanning.
|
|
- Bundle-size analysis — Vercel / Vite build output already reports this.
|
|
- Auto-removal of approved-then-unused deps — `pnpm fallow` territory.
|
|
- License auto-enforcement at the lockfile layer (license-checker plugins) —
|
|
defer until the policy has run for some time and we know where it leaks.
|
|
- Anything app-tier. Deps in `apps/*` are author's call per the tier model.
|
|
- Devdeps in any tier. Only `dependencies` (runtime) require traces.
|
|
|
|
## Constraints
|
|
|
|
- **ADR-022** is the source of truth. This PRD implements but does not extend it.
|
|
- **ADR-006 + ADR-010** — the tier trigger maps onto the existing boundary-tag
|
|
system. No new mental model; ESLint already partitions blast radius.
|
|
- **ADR-019** — the sandcastle reviewer prompt is one of four enforcement
|
|
layers. Whatever the agent loop does must compose with the existing prompt
|
|
shape at `.sandcastle/reviewer.prompt.md`.
|
|
- **ADR-021** — release-please picks up dependency changes from commit history.
|
|
The trace file landing in the **same commit** as the `package.json` change
|
|
is required so release notes correlate cleanly with policy records.
|
|
- **Conformance system parity** — the enforcement stack mirrors the 5-gate
|
|
latency pattern from ADR-012. Same vocabulary, same agent feedback loop.
|
|
- **Conventional Commits** — every commit produced by the implementation
|
|
follows `<type>(<scope>): <subject>`.
|
|
- **`--no-verify` is forbidden** — the bash-guard hook already enforces this;
|
|
the new pre-commit check inherits that protection.
|
|
- **Skill must be deterministic from explicit args** — `/evaluate-library <name>
|
|
--tier <feature|core|app> --target <package-path>`. The Claude hook produces
|
|
exactly this invocation from a `pnpm add` command line.
|
|
|
|
## Success criteria
|
|
|
|
- `pnpm typecheck && pnpm test && pnpm lint && pnpm conformance && pnpm fallow:audit`
|
|
pass green at the end of the epic.
|
|
- `pnpm coverage:diff` covers every changed executable line introduced by
|
|
the implementation slices.
|
|
- Attempting to commit a new feature-tier runtime dep without a sibling trace
|
|
file is blocked by the pre-commit hook with a clear error pointing to the skill.
|
|
- Running the `evaluate-library` skill against `trpc-to-openapi` (the rejected
|
|
library from the grill session) produces a `decision: rejected` trace with
|
|
`named-consumer: fail` and prose citing the conversation, in <90 seconds of
|
|
agent work.
|
|
- Running `pnpm turbo gen core-package events` (or any other optional core)
|
|
emits pre-shipped traces for every direct dep of that core into
|
|
`docs/library-decisions/`, all `decision: approved` and ADR-cited.
|
|
- The Claude `PreToolUse` hook fires on `pnpm add <anything>` and emits the
|
|
skill-reminder system-reminder; does **not** auto-deny.
|
|
- All existing runtime deps in feature- and core-tier packages have backfilled
|
|
trace files dated 2026-05-14 in `docs/library-decisions/`.
|
|
- `CLAUDE.md` Key Conventions includes the one-line policy bullet.
|
|
- `docs/glossary.md` already includes **Library trace** and **Pre-shipped
|
|
trace** entries (landed inline during the grill session).
|
|
|
|
## User stories
|
|
|
|
1. **As a developer adding a new feature dependency**, I want a deterministic
|
|
skill that walks me through the 8 filters and 3 prompts in collect-cheap-
|
|
skip-expensive order, so I don't forget any check and the trace file is
|
|
written automatically with my answers.
|
|
2. **As an agent dispatched against a slice that needs a new dep**, I want the
|
|
Claude `PreToolUse` hook to inject a system-reminder pointing me at the
|
|
skill the moment I'm about to run `pnpm add`, so I don't bypass the policy
|
|
by reflex.
|
|
3. **As an agent editing a `package.json` by hand**, I want the Claude
|
|
`PostToolUse` hook to inject the same reminder, so the policy isn't
|
|
sidestepped by paste-then-install.
|
|
4. **As a reviewer (human or agent)**, I want the pre-commit hook to refuse a
|
|
commit that adds a runtime dep to a feature/core package without a sibling
|
|
trace file, so I don't have to remember to check during review.
|
|
5. **As a future agent considering a previously-rejected library**, I want
|
|
to find the rejection trace in `docs/library-decisions/` in <1s of
|
|
`ls`/`grep`, so I don't re-litigate a decision that has already been made.
|
|
6. **As an EU-resident maintainer**, I want the EU-data-residency filter to
|
|
reject US-only SaaS components by default and force a `self-hostable` or
|
|
`EU-region-configured` justification in the trace, so user data doesn't
|
|
leave the EU silently.
|
|
7. **As a maintainer scaffolding an optional core via
|
|
`pnpm turbo gen core-package <name>`**, I want pre-shipped traces emitted
|
|
for every direct dep of the new core, so the policy is satisfied by
|
|
construction.
|
|
8. **As a security-conscious maintainer**, I want the CVE-scan filter to run
|
|
`pnpm audit --audit-level=moderate` at evaluation time and snapshot the
|
|
result + commands into the trace, so I can re-run them later to detect drift.
|
|
9. **As an agent reviewing a slice in sandcastle**, I want the reviewer prompt
|
|
to check for trace presence + correctness, so I can reject incompliant
|
|
slices without needing a separate workflow.
|
|
10. **As a maintainer reading the repo for the first time**, I want
|
|
`docs/guides/adding-a-library.md` to explain the policy with worked
|
|
examples (one approved, one rejected), so I understand the why and how
|
|
before I face the gate myself.
|
|
|
|
## Implementation decisions
|
|
|
|
**Module sketch** — what lands where, by package and concern (no file paths
|
|
where prose suffices):
|
|
|
|
- **The skill itself** — `.claude/skills/evaluate-library/` follows the same
|
|
shape as `to-prd`, `grill-with-docs`, `improve-codebase-architecture`.
|
|
SKILL.md is authoritative; supporting files (`POLICY.md` mirroring ADR-022,
|
|
`TRACE-TEMPLATE.md` showing the YAML+headings shape, `EXAMPLES/` worked
|
|
cases) flesh it out. The skill is invocable via slash command
|
|
`/evaluate-library`.
|
|
- **Trace schema module** — a small shared module at
|
|
`scripts/library-decisions/schema.mjs` exporting (1) a Zod schema for the
|
|
trace's frontmatter, (2) a parser that reads a trace file and returns the
|
|
validated frontmatter, (3) a serializer that takes filter results + prose
|
|
blocks and emits a trace file. Both the skill and the pre-commit checker
|
|
import this module. Deep module — small interface (parse/serialize/validate),
|
|
high leverage across the four enforcement layers.
|
|
- **Pre-commit check script** — `scripts/library-decisions/check.mjs`. Walks
|
|
`git diff --cached --name-only -- '**/package.json'`, for each file extracts
|
|
newly-added dep lines via `git diff --cached <file>`, derives the tier from
|
|
the path, and for each new runtime dep checks that
|
|
`docs/library-decisions/*-<name>.md` is also staged with `decision: approved`.
|
|
Exits non-zero with a pointer to the skill if any check fails. Invoked from
|
|
`.husky/pre-commit` as step 4 (after the existing state-sync guard).
|
|
- **Claude hooks** — a single `.claude/hooks/library-policy-nudge.sh` that
|
|
dispatches on `tool_use_type` to handle both `PreToolUse` (Bash with
|
|
`pnpm add` / `pnpm i <pkg>` pattern) and `PostToolUse` (Edit/Write on
|
|
`**/package.json`). Same style as the existing `generator-first-nudge.sh`.
|
|
Emits a non-blocking system-reminder to stdout that the harness threads
|
|
into the agent's next turn.
|
|
- **Sandcastle reviewer prompt** — append a "Library-trace check" section
|
|
to `.sandcastle/reviewer.prompt.md`. The reviewer runs
|
|
`node scripts/library-decisions/check.mjs --staged-against <base>` in the
|
|
sandbox before issuing its verdict.
|
|
- **Generator templates** — each `turbo/generators/templates/core-package/<name>/`
|
|
gets a `docs/library-decisions/` subtree with one `.md` per direct dep of
|
|
that core. The generator copies these alongside the core's package.json
|
|
into the workspace. Frozen via the existing
|
|
`turbo/generators/__snapshots__/core-package/<name>.snapshot.json` mechanism.
|
|
- **Backfill traces** — write one trace per existing runtime dep in feature/
|
|
core tier. The deps cluster naturally by ADR provenance: ADR-002 cluster
|
|
(Inversify + reflect-metadata), ADR-014 cluster (Sentry family), ADR-017
|
|
cluster (OpenTelemetry family), and the un-cited cluster (`payload`,
|
|
`@trpc/server`, `zod`, `superjson`, plus any others surfaced by inventory).
|
|
All traces dated 2026-05-14, `decision: approved`, ADR citation where the
|
|
cluster maps to one.
|
|
- **`CLAUDE.md` update** — one bullet in Key Conventions: _"New runtime
|
|
dependencies in feature- or core-tier packages require a trace at
|
|
`docs/library-decisions/<date>-<name>.md` produced by the `evaluate-library`
|
|
skill — see ADR-022."_
|
|
|
|
**Trace schema (frontmatter)** — Zod schema (lifted from ADR-022 §4):
|
|
|
|
```
|
|
package: string
|
|
version: string // semver range as written in package.json
|
|
tier: "app" | "feature" | "core"
|
|
decision: "approved" | "rejected"
|
|
date: ISO date string
|
|
deciders: string[]
|
|
adr: string | null // "adr-NNN" slug or null
|
|
filter-results: {
|
|
license: SPDX-id-string
|
|
types: "native" | `@types/${string}` | "none"
|
|
maintenance: "active" | "dormant" | "abandoned"
|
|
boundary-fit: "pass" | "fail"
|
|
shadow-check: "pass" | "fail" | `shadows ${string}`
|
|
eu-residency: "ok" | "n/a" | "self-hostable" | "fail"
|
|
cve-scan: "clean" | `${cve-id}` | "fail"
|
|
named-consumer: "pass" | "fail"
|
|
}
|
|
verification-commands: string[]
|
|
accepted-cves?: string[] // optional per-trace allowlist
|
|
```
|
|
|
|
Headings (machine-checkable order): one `## Filter: <name>` per filter +
|
|
one `## Prompt: <name>` per prompt, in the order listed in ADR-022.
|
|
|
|
**Skill fail behavior** — collect-cheap-skip-expensive. The four cheap
|
|
structural filters (license, types, shadow-check, boundary-fit) always run
|
|
to completion. The four expensive filters (maintenance, CVE scan, EU residency,
|
|
named-consumer) short-circuit after the first reject. The trace records which
|
|
filters ran and which were skipped, with a "skipped because earlier filter
|
|
already rejected" sentinel value.
|
|
|
|
**Pre-commit hook decision-state check** — beyond presence, the script also
|
|
verifies that the trace's `decision` matches the dep status: a dep listed in
|
|
`package.json` requires `decision: approved`; a trace with `decision: rejected`
|
|
that names a dep that's also in the package.json is a hard fail (rejected
|
|
libraries cannot ship).
|
|
|
|
**Conformance system composition** — no new use cases, controllers, manifest
|
|
entries, audits, events, or jobs. This PRD is a workflow/policy implementation,
|
|
not a feature-domain change. The conformance gates apply only to the new
|
|
TypeScript/JS modules (Zod schema module + check script) — they get standard
|
|
vitest coverage.
|
|
|
|
## Testing decisions
|
|
|
|
- **`scripts/library-decisions/schema.mjs`** — unit tests covering: valid
|
|
trace parses round-trip; missing required field rejected; unknown filter
|
|
rejected; invalid enum value rejected.
|
|
- **`scripts/library-decisions/check.mjs`** — integration tests covering: new
|
|
feature-tier dep without trace → fail with exit 1; new feature-tier dep with
|
|
approved trace → pass; new feature-tier dep with rejected trace listed in
|
|
package.json → fail; new app-tier dep (no trace required) → pass; new devdep
|
|
→ pass (devdeps exempt); multi-file staged diff with mixed pass/fail → fail
|
|
with per-package report; non-runtime dep (`peerDependencies` only) → pass.
|
|
Use a temp git repo as the test fixture.
|
|
- **The skill** — no automated test in the conformance sense (it's a prose
|
|
runbook for an agent). The success criterion is that running it against
|
|
`trpc-to-openapi` produces the documented rejection trace; verified manually
|
|
during the epic.
|
|
- **Generator pre-shipped traces** — the existing
|
|
`turbo/generators/__snapshots__/core-package/<name>.snapshot.json` snapshot
|
|
test extends to cover the new trace files. A failing snapshot is the gate.
|
|
- **Claude hook scripts** — bash smoke tests that pipe a mocked Claude
|
|
hook payload (`{ "tool_input": { "command": "pnpm add foo" } }`) into the
|
|
script and assert stderr contains the skill-reminder marker. Same style as
|
|
`generator-first-nudge.sh`'s existing tests if any (check during impl).
|
|
- **Prior art** — mirror the test patterns from `scripts/work/state-sync-guard.mjs`
|
|
(the pre-commit `_state.json` check) for the new check script; the
|
|
fixture/assertion shape carries over directly.
|
|
- **Coverage bands** — the new scripts under `scripts/library-decisions/` are
|
|
not feature packages, so they don't have a `feature.manifest.ts` and aren't
|
|
bound by per-layer coverage thresholds. Add them to the diff-coverage
|
|
exception list **only if** the diff-coverage gate is too strict on script
|
|
files; default is they should hit 100% statement coverage because they're
|
|
small.
|
|
|
|
## Open questions
|
|
|
|
- **Q1:** CVE-accepted-risk mechanism — per-trace `accepted-cves: ["CVE-XXXX-YYYY"]`
|
|
frontmatter array vs central `docs/library-decisions/_cve-allowlist.md`? —
|
|
**Recommended:** per-trace. Acceptance is library-scoped, not global; a
|
|
central file becomes a god-object that no agent reads in full.
|
|
- **Q2:** Should the policy also gate `peerDependencies` additions, or only
|
|
`dependencies`? — **Recommended:** only `dependencies`. Peer deps are a
|
|
contract, not a runtime addition; if a feature declares a peer, the actual
|
|
runtime adopter (an app or another package) is the one whose `dependencies`
|
|
the policy catches.
|
|
- **Q3:** Should the backfill be one commit per trace or one batch commit per
|
|
ADR cluster? — **Recommended:** one commit per cluster (4 commits total),
|
|
each with conventional `chore(deps): backfill library traces for <cluster>`.
|
|
Avoids both extremes (1 mega-commit and 10 noisy single-trace commits).
|
|
- **Q4:** Should the skill be permitted to **write the trace** before the
|
|
user/agent approves the final decision, or must the final write be a
|
|
separate explicit step? — **Recommended:** skill writes the trace
|
|
unconditionally at the end of evaluation; the trace IS the record, including
|
|
for rejections. No "draft" state.
|
|
- **Q5:** Where do the Claude hooks register themselves? — **Investigate:**
|
|
the existing `.claude/hooks/*.sh` are referenced by some kind of settings
|
|
file or auto-discovered. Confirm during the first slice that adds the new
|
|
hook script and matches the existing registration pattern.
|
|
|
|
## Out of scope (deferred)
|
|
|
|
- **Periodic re-verification.** Running each trace's `verification-commands`
|
|
on a schedule (nightly?) to detect drift — new CVEs, license changes,
|
|
upstream abandonment. Deserves its own PRD; would compose with `pnpm fallow`
|
|
as a sixth gate.
|
|
- **Auto-generated PR comments** that summarize the trace for human reviewers.
|
|
Nice-to-have once the policy has lived for a quarter.
|
|
- **`pnpm libs <subcommand>` ergonomic CLI** — wrapping `check.mjs` as
|
|
`pnpm libs check`, plus `pnpm libs list`, `pnpm libs orphans`, etc. Defer
|
|
until the raw script proves the workflow.
|
|
|
|
## Further notes
|
|
|
|
- **Anchored by ADR-022** — Library evaluation policy. Read that first.
|
|
- **Glossary entries** for **Library trace** and **Pre-shipped trace** landed
|
|
during the 2026-05-14 grill session that produced ADR-022.
|
|
- **Conversation provenance** — the 2026-05-14 grill-with-docs session that
|
|
produced this PRD is captured in the session transcript; ADR-022 cites the
|
|
OpenAPI near-miss as concrete catalyst.
|