Convention shift: epic folders + PRD filenames + frontmatter id
fields are now bare slugs. The created: timestamp (Phase 2) carries
the date; folder names don't repeat it. A future <task-id>-<slug>
shape (e.g. ClickUp) lands cleanly when that integration ships.
Renames (git mv preserves history):
- docs/work/2026-05-13-binder-wrap-helper/
-> docs/work/binder-wrap-helper/
- docs/work/2026-05-14-library-evaluation-policy/
-> docs/work/library-evaluation-policy/
- docs/work/2026-05-14-ci-security-and-supply-chain/
-> docs/work/ci-security-and-supply-chain/
- docs/work/prds/2026-05-13-binder-wrap-helper.prd.md
-> docs/work/prds/binder-wrap-helper.prd.md
- docs/work/prds/2026-05-13-coverage-architecture.prd.md
-> docs/work/prds/coverage-architecture.prd.md
- docs/work/prds/2026-05-14-library-evaluation-policy.prd.md
-> docs/work/prds/library-evaluation-policy.prd.md
- docs/work/prds/2026-05-14-ci-security-and-supply-chain.prd.md
-> docs/work/prds/ci-security-and-supply-chain.prd.md
Frontmatter updates inside the renamed files: epic id, epic prd,
story epic, PRD id, PRD builds-on all drop date prefixes.
System folder + state file move:
- New docs/work/_system/ holds framework-managed state.
- docs/work/_state.json -> docs/work/_system/_state.json.
- state-builder.mjs adds _system to SKIP_FOLDERS.
- cli.mjs + state-sync-guard.mjs + .husky/pre-commit point at the
new path.
template-reset-v1 epic deleted entirely (one-off cleanup epic from
the pre-date-convention era; status was already done).
Generator-template updates (so new artifacts ship in the right
shape):
- .sandcastle/decomposer.prompt.md emits bare-slug folder names +
ISO created: timestamp.
- .claude/skills/to-prd/SKILL.md template uses bare-slug filename +
bare-slug id field + ISO created: timestamp.
Doc reference updates: glossary, runbook, agent-first-workflow-
and-conformance, reviewer prompt, ADR-020, ADR-022, ADR-023 all
point at the new paths/slugs.
13 KiB
ADR-020 — Agent-first coverage architecture (4 layers + manifest-driven thresholds)
Status: Accepted Date: 2026-05-13 Builds on: ADR-006 (vertical-feature-packages), ADR-011 (TDD foundation) PRD: docs/work/prds/coverage-architecture.prd.md
Context
ADR-011 established the TDD foundation: per-package vitest configs with V8 coverage, a coverage baseline (80/75/80/80) and stricter per-layer bands (100% on entities/, application/use-cases/, interface-adapters/controllers/). The ESLint conformance rule usecase-must-have-test-file enforces that a test file exists. CI runs pnpm test -- --coverage and uploads **/coverage/lcov.info as an artifact.
This leaves five gaps that matter especially in an agent-first repo:
- No diff-coverage gate. A slice can ship without exercising its new lines. The ESLint rule only checks file presence; the per-package thresholds catch only large drops.
- No aggregate visibility. N separate lcov files; no merged view, no trend over time.
- Threshold declarations are duplicated across 5+
vitest.config.tsfiles. Drift is mechanical to spot (we did it during the 2026-05-13 brainstorm:@repo/mediahad nocoverage:block at all;@repo/navigationfailed its declared layer thresholds in entities + controllers). - 100% coverage with weak assertions is invisible. Coverage doesn't measure whether tests would catch real regressions. Mutation testing — explicitly deferred in ADR-011 — is the next signal.
- Coverage data isn't agent-readable. The HTML report serves humans; the dispatch loop has no way to ask "did my slice cover its diff?".
Decision
1. Adopt a 4-layer coverage architecture mirroring the 5-gate conformance philosophy (multi-latency, machine-readable, agent-first):
| Layer | Catches | Latency | Surface |
|---|---|---|---|
| L0 Per-layer vitest thresholds | Drift below declared bands | ~5–30s per package | pnpm test --coverage (existing) |
| L1 Diff coverage | Changed line not exercised | ~5s after L0 | pnpm coverage:diff; CI gate; dispatch post-task |
| L2 Aggregate trend | Drift across the codebase over time | ~10s | pnpm coverage:aggregate; committed coverage/summary.json |
| L3 Mutation testing | Tests that exist + execute the code + assert nothing | Minutes | pnpm mutate; on-demand, not default pnpm test |
Each layer answers a distinct question; none replaces the others.
2. Make feature.manifest.ts the single source of truth for coverage expectations. A new coverage: section per feature:
coverage: {
bands: {
"entities": { statements: 100, branches: 100, functions: 100, lines: 100 },
"use-cases": { statements: 100, branches: 95, functions: 100, lines: 100 },
"controllers": { statements: 100, branches: 95, functions: 100, lines: 100 },
baseline: { statements: 80, branches: 75, functions: 80, lines: 80 },
},
mutationTargets: ["entities", "use-cases"],
}
Three readers consume the manifest:
- Vitest —
vitest.config.tsimports the manifest'scoverageand emits itsthresholds. The duplicated block in 5 per-feature vitest configs goes away. assertFeatureConformance— readscoverage/lcov.infofor the package at boot and asserts each band. Graceful degradation inUSE_DEV_SEED=true(warns rather than throws when lcov is absent).pnpm coverage:diff— usesbaselinefor uncategorized files; stricter layer bands override per matching path glob.
This eliminates the duplication that caused the @repo/media drift and centralizes one decision in one place per feature.
3. Diff coverage is cover-the-diff, not cover-the-new-code. Every changed executable line must have execution-count > 0 against the merged lcov. Modified-but-not-new lines count too — catches "agent edited code, didn't update the test." Allowlist: *.test.ts, *.config.*, *.md, *.json, *.mjs, plus the per-package exclude lists.
4. Aggregate trend ships in-tree, not via SaaS. coverage/summary.json is committed on merge to main. Trend readable via git log -- coverage/summary.json. No external service dependency; the dispatch loop can read history without a network call.
5. Mutation testing is opt-in and narrowly scoped. Stryker with @stryker-mutator/vitest-runner, runs on entities/ + application/use-cases/ only. Default mutation-score threshold 80% per feature (tunable per-manifest). Not part of pnpm test. Nightly GH Action surfaces score drift > 5%.
6. Output format is machine-first. pnpm coverage:diff emits JSON to stdout; human-readable summary to stderr. The dispatch loop reads stdout; humans read stderr or the HTML report.
Alternatives considered
- Codecov / Coveralls SaaS. Polished PR comments and trend dashboards, free for OSS. Rejected as the primary L2 store — adds an external dep, makes the dispatch loop dependent on a network call, and the PR-comment UX targets humans (not the primary consumer of this signal). Can be added later as gold-plating without disturbing the architecture.
- Cover-the-new-code instead of cover-the-diff. Lighter touch; ignores modified lines. Rejected — catches less drift. A slice that edits a use case without updating its test should fail, and cover-the-new-code wouldn't notice.
- Keep thresholds in per-package vitest configs. Status quo. Rejected — the 2026-05-13 audit found drift in 2 of 5 features (media had no block at all; navigation's block diverged subtly from the canonical). Manifest centralization is the only durable fix.
- Run mutation testing in default
pnpm test. Rejected — Stryker on entities + use-cases takes minutes. Adding minutes to the default loop violates the constraint ("new gates must not add more than ~30s wall time"). On-demand is the right cadence; nightly catches drift. - Mutation testing across all layers. Rejected for v1 — repository/controller/integration code has too many environmental dependencies to mutate cleanly. Start narrow; expand if signal is high.
- Use ESLint or fallow for diff coverage. Rejected — diff coverage needs runtime data (which lines actually executed), not AST or filesystem state. It belongs alongside
pnpm test, not inpnpm lintorpnpm fallow. - Boot-time coverage assertion is too heavy. Considered. Counter-argument: the assertion is
O(features × lcov-file-size)— small numbers, ~200ms. The graceful-degradation in dev mode means contributors aren't blocked. The payoff — coverage drift caught at the same latency as TypeScript brands — justifies the machinery.
Consequences
Positive:
- Every PR/task is gated on covering its own diff. Agent shipping an untested slice becomes mechanically impossible at the CI step.
- One source of truth per feature for coverage expectations. The
@repo/media-style "no coverage block" drift can't recur. - Trend history lives in the repo.
git log -- coverage/summary.jsonanswers "how has coverage moved over the last quarter?" without leaving the editor. - Mutation testing on the highest-leverage layers (entities + use-cases — the pure-business-logic surface) raises the floor on test quality without slowing the dispatch loop.
- Machine-readable diff-coverage output integrates directly with the dispatch loop's post-task verification, completing the agent-first observability story.
- Coverage joins the 5-gate conformance philosophy as a first-class signal; ADR-020 becomes the row alongside TS brands / ESLint / boot /
pnpm conformance/ fallow / coverage.
Negative:
- Implementation surface is non-trivial: 6–8 stories spanning manifest schema, vitest auto-derive, two new scripts, boot-time assertion, mutation tooling, ADR + guide + glossary + generator + hook updates.
- The boot-time assertion adds a small dependency on
coverage/lcov.infoexisting. Graceful degradation in dev mode handles this, but the implementation needs care. coverage/summary.jsoncommitted on merge introduces a small CI permissions surface (contents: write) gated to the main-branch workflow.- Mutation testing is slow. The nightly cadence is the compromise; on-demand
pnpm mutateis opt-in but rare in practice.
Implementation phasing
Shipped as a single epic over 10 commits on 2026-05-13. Per-step state:
| # | Step | Commit | Status |
|---|---|---|---|
| 1 | Manifest schema + helper + auth proof-of-concept | f7baa8b |
✅ Shipped |
| 2 | Vitest auto-derive (helper + DEFAULT_COVERAGE_BANDS) |
f7baa8b + rollouts |
✅ Shipped (all 5 features wired) |
| 3 | L1 diff coverage (scripts/coverage/diff.mjs) |
412d994 |
✅ Shipped |
| 4 | L2 aggregate (scripts/coverage/aggregate.mjs + summary.json) |
bd5a077 |
✅ Shipped |
| 5 | CI integration (validate gate + snapshot workflow) | 39e33eb |
✅ Shipped |
| 6 | Helper rollout to blog + marketing-pages | 15db9c4 |
✅ Shipped |
| 7 | Docs + generator + hook rollout | 4dce1df + f4254aa |
✅ Shipped |
| 8 | L3 mutation testing (Stryker + nightly Action) | 6428f10 |
✅ Shipped (auth proof-of-concept; other features can add stryker.config.json by extends: "@repo/core-testing/stryker.base.json") |
| 9 | L0 unification (close test gaps in nav + media + marketing-pages) | bf0b049 |
✅ Shipped — all 5 features hit declared bands |
| 10 | Boot-time assertFeatureConformance coverage check |
— | ⏸ Deferred. Duplicates L0's structural enforcement when both readers derive from the same manifest source of truth; the drift it was supposed to catch is mechanically impossible. Revisit if a concrete need emerges. |
Repo-wide state at shipping (coverage/summary.json): statements 95.87% / branches 88.91% / functions 100% / lines 95.87%. All five features pass their declared 100%/100%/95%/100% bands on entities/use-cases/controllers.
Related
- ADR-006 — vertical-feature-packages
- ADR-011 — TDD foundation
- ADR-018 — audit-and-compliance (similar manifest-declared shape pattern)
- ADR-019 — sandcastle agent orchestration (the dispatch loop that reads
pnpm coverage:diff) - PRD
coverage-architecture— implementation seed