Files
agentic-dev/.claude/skills/evaluate-library/SKILL.md
Danijel Martinek b10ccba927 feat(scripts): add evaluate-library skill + supporting files
Adds the /evaluate-library skill runbook at .claude/skills/evaluate-library/
with SKILL.md (8-filter + 3-prompt protocol, collect-cheap-skip-expensive
ordering, trace-write step, skip sentinel), POLICY.md (ADR-022 summary
≤2 pages), TRACE-TEMPLATE.md (complete YAML frontmatter + 11 headings in
order), and EXAMPLES/ with one approved (clsx) and one rejected
(trpc-to-openapi, named-consumer: fail) worked trace.

Updates session-start.sh to surface the skill in session pointers.
The skill is auto-registered by the harness on SKILL.md creation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 05:45:43 +00:00

8.7 KiB

name, description
name description
evaluate-library Walk the 8-filter + 3-prompt library evaluation protocol for a named package, write the decision trace to docs/library-decisions/, and return pass/fail. Use when adding a runtime dependency to a feature or core package, or when the library-policy-nudge hook fires.
/evaluate-library <package-name> --tier <feature|core|app> --target <package-path>

All three arguments are required. The library-policy-nudge hook emits this exact invocation. For app-tier packages, evaluation still runs but a trace is optional (author's call per ADR-022 §1).

Overview

Walk eight hard auto-reject filters in collect-cheap-skip-expensive order, then answer three discussion prompts. Write the trace unconditionally at the end — including for rejections. A rejection trace is a permanent record that prevents future agents from re-litigating the same decision.

Phase 1 — Cheap filters (always run to completion, even if one fails)

Run all four cheap filters regardless of their outcomes. Record each result before moving to Phase 2.

Filter 1: license

Command: node -e "const p = JSON.parse(require('fs').readFileSync('./node_modules/<pkg>/package.json','utf8')); console.log(p.license)"

Allowlist: MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, MPL-2.0.

Result values: the SPDX identifier (e.g. MIT) if allowed, or <SPDX-id> (rejected) if outside the allowlist. Anything outside the allowlist is an automatic reject but does not stop Phase 1.

Filter 2: types

Check whether TypeScript types ship with the package or via @types/<pkg>:

ls node_modules/<pkg>/index.d.ts 2>/dev/null && echo native || npm info @types/<pkg> version 2>/dev/null | head -1

Result values: native (ships its own .d.ts), @types/<pkg> (community types available), or none (auto-reject — un-typed library shifts maintenance cost to the feature).

Filter 3: shadow-check

Check whether this library duplicates a must-have already locked in the workspace. Locked must-haves: zod (validation), inversify (DI, ADR-002), payload (CMS), @trpc/server (API layer), superjson (serialisation), reflect-metadata (DI metadata).

Command: cat package.json | grep -E '"(zod|inversify|payload|@trpc/server|superjson|reflect-metadata)"' — run from the workspace root.

Result values: pass (no shadow), fail (exact duplicate of a locked dep), "shadows <x>" (functional parallel that would create two libraries doing the same job — auto-reject). A replacement must be a separate ADR with consequences analysis, not a parallel adoption.

Filter 4: boundary-fit

Confirm the dependency does not violate ESLint boundary-tag rules for the target tier (ADR-006, ADR-010, ADR-017).

Key rules:

  • Feature packages cannot import @sentry/* or @opentelemetry/sdk-* directly — those are reserved for core (ADR-017 §4).
  • No package may import across feature boundaries without going through the event bus or tRPC.
  • Optional core packages can only be imported by apps and core-composition-tagged packages.

Check by reviewing what the proposed library's transitive imports would bring in and whether any violate the boundary ruleset.

Result values: pass or fail.


After Phase 1: tally results. If any cheap filter failed, the overall decision is rejected. Proceed to Phase 2 anyway — all expensive filters still run if the Phase 1 decision is already rejected (they inform the full record). If all cheap filters passed, proceed to Phase 2 to determine the final decision.

Phase 2 — Expensive filters (short-circuit after first reject)

Run in order. On the first failure, set remaining filter results to skip and skip to the Trace write step.

Filter 5: maintenance

Check last release date and recent PR/issue activity:

npm info <pkg> time.modified
npm info <pkg> time | tail -5

Result values:

  • active — last release < 18 months and PR/issue activity < 12 months
  • dormant — stable, not actively developed (acceptable for finished libraries like reflect-metadata)
  • abandoned — last release ≥ 18 months or no activity in ≥ 12 months → auto-reject; short-circuit remaining expensive filters

On abandoned → set cve-scan, eu-residency, named-consumer to skip → write trace.

Filter 6: cve-scan

pnpm audit --audit-level=moderate 2>&1 | head -40

Result values: clean (no advisories), an advisory ID like GHSA-xxxx-xxxx-xxxx (accepted risk — document in accepted-cves frontmatter), or fail (open advisory not accepted → auto-reject; short-circuit remaining expensive filters).

On fail → set eu-residency, named-consumer to skip → write trace.

Filter 7: eu-residency

Applies only if the library transmits user data, telemetry, business state, or secrets to a vendor-controlled endpoint by default. Examples: analytics SDKs, error-tracking clients, AI APIs, log aggregation services.

Exemptions (result: n/a): pure in-process libraries (no network calls), self-hostable software where the operator controls the endpoint, and build-time-only tools.

For non-exempt libraries: verify the vendor offers an EU data region AND that the integration in target is configured to use it.

Result values: ok (vendor offers EU region, integration configured), n/a (no data transmission), self-hostable (operator-controlled endpoint), fail → auto-reject; short-circuit named-consumer.

On fail → set named-consumer to skip → write trace.

Filter 8: named-consumer

Answer: Who calls this code path today, or who is blocked waiting for it?

A named consumer is a concrete call site that exists now or a feature blocked on this capability today. "We might want this later", "external clients could use this", and "it would be nice to have" are not named consumers.

If the only possible callers are hypothetical or future → fail → auto-reject.

Result value: pass or fail.


Skip sentinel

When a filter is short-circuited (not evaluated), write skip for its frontmatter value. The Zod schema validates approved traces end-to-end; rejected/partial traces may carry skip in fields that would normally require an enum value. The pre-commit check only validates that approved traces exist for new deps — partial traces are informational records.

Three discussion prompts

Answer all three in the trace, regardless of filter outcome. These are not auto-reject filters; any answer is acceptable with justification.

Prompt: replaces

What existing library or approach does this replace? New-and-old running in parallel is a smell — name the thing being retired and the retirement plan, or explain why parallel adoption is intentional and time-bounded.

Prompt: migration-cost-out

What does ripping this back out look like 18 months from now? Rate: mechanical (swap one package, update call sites), hard (scattered integration points, data-format dependencies), or impossible (vendor lock-in, protocol coupling). Higher cost raises the bar for adoption.

Prompt: alternatives-considered

Name at least two alternatives evaluated before choosing this library. For core-tier adoptions, this section is also duplicated into the companion ADR. If no alternatives exist, explain why (e.g., the library is the de-facto standard with no viable substitutes).


Trace write step

Write the trace unconditionally at evaluation end — even for rejections, even for partial traces.

Path: docs/library-decisions/<YYYY-MM-DD>-<package-name>.md

Use today's date. Use the TRACE-TEMPLATE.md in this directory as the structural guide.

Frontmatter rules:

  • decision: approved only if all eight filters passed. Otherwise decision: rejected.
  • adr: null for feature-tier. For core-tier approvals, coordinate the ADR slug before writing (adr: adr-NNN).
  • verification-commands — include the literal commands run for each filter, one per line.
  • accepted-cves: [] (empty unless you accepted a specific advisory).
  • For skipped expensive filters, write skip for the frontmatter value and omit the prose section body or note "Not evaluated — skipped due to earlier rejection."

After writing the trace:

  • For approved traces: confirm the trace is staged in the same commit as the package.json change. The pre-commit hook validates this.
  • For rejected traces: stage the trace file alone. Do not run pnpm add <pkg>.

After completing the evaluation, emit a one-paragraph summary:

/evaluate-library result: <approved|rejected> — <package>@<version> (<tier>)
Rejection filters (if any): <filter names>
Trace written to: docs/library-decisions/<date>-<package>.md