Architecture record for the agent-first coverage initiative seeded by the 2026-05-13 PRD. Captures the durable decisions: - 4-layer architecture (L0 vitest, L1 diff, L2 aggregate, L3 mutation) - Manifest-driven coverage band as single source of truth (vitest + assertFeatureConformance + pnpm coverage:diff all read from it) - Cover-the-diff (changed lines), not cover-the-new-code - Committed coverage/summary.json (no SaaS), trend via git log - Mutation testing scoped to entities + use-cases, on-demand only - Machine-first output format (JSON stdout, human stderr) Glossary gets a new "Coverage" section with 7 entries (coverage band, L0-L3 layers, diff coverage, mutation testing, mutation score, coverage/summary.json), plus two relationship rows and a flagged ambiguity for "coverage" qualifiers. prompt-context.sh hook gets a 9th keyword group — when a prompt mentions coverage / uncovered / lcov / mutation / stryker, the relevant ADR + guide path are injected as additional context for the turn. This is the documentation layer of the coverage epic. Implementation (manifest schema, vitest auto-derive, scripts, boot assertion, mutation tooling) lands in subsequent stories. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
111 lines
10 KiB
Markdown
111 lines
10 KiB
Markdown
# ADR-020 — Agent-first coverage architecture (4 layers + manifest-driven thresholds)
|
||
|
||
**Status:** Accepted
|
||
**Date:** 2026-05-13
|
||
**Builds on:** ADR-006 (vertical-feature-packages), ADR-011 (TDD foundation)
|
||
**PRD:** docs/work/prds/2026-05-13-coverage-architecture.prd.md
|
||
|
||
## Context
|
||
|
||
ADR-011 established the TDD foundation: per-package vitest configs with V8 coverage, a coverage baseline (80/75/80/80) and stricter per-layer bands (100% on `entities/`, `application/use-cases/`, `interface-adapters/controllers/`). The ESLint conformance rule `usecase-must-have-test-file` enforces _that a test file exists_. CI runs `pnpm test -- --coverage` and uploads `**/coverage/lcov.info` as an artifact.
|
||
|
||
This leaves five gaps that matter especially in an agent-first repo:
|
||
|
||
1. **No diff-coverage gate.** A slice can ship without exercising its new lines. The ESLint rule only checks file presence; the per-package thresholds catch only large drops.
|
||
2. **No aggregate visibility.** N separate lcov files; no merged view, no trend over time.
|
||
3. **Threshold declarations are duplicated** across 5+ `vitest.config.ts` files. Drift is mechanical to spot (we did it during the 2026-05-13 brainstorm: `@repo/media` had no `coverage:` block at all; `@repo/navigation` failed its declared layer thresholds in entities + controllers).
|
||
4. **100% coverage with weak assertions is invisible.** Coverage doesn't measure whether tests would catch real regressions. Mutation testing — explicitly deferred in ADR-011 — is the next signal.
|
||
5. **Coverage data isn't agent-readable.** The HTML report serves humans; the dispatch loop has no way to ask "did my slice cover its diff?".
|
||
|
||
## Decision
|
||
|
||
**1. Adopt a 4-layer coverage architecture** mirroring the 5-gate conformance philosophy (multi-latency, machine-readable, agent-first):
|
||
|
||
| Layer | Catches | Latency | Surface |
|
||
| ---------------------------------- | ---------------------------------------------------- | ------------------ | ------------------------------------------------------------ |
|
||
| **L0** Per-layer vitest thresholds | Drift below declared bands | ~5–30s per package | `pnpm test --coverage` (existing) |
|
||
| **L1** Diff coverage | Changed line not exercised | ~5s after L0 | `pnpm coverage:diff`; CI gate; dispatch post-task |
|
||
| **L2** Aggregate trend | Drift across the codebase over time | ~10s | `pnpm coverage:aggregate`; committed `coverage/summary.json` |
|
||
| **L3** Mutation testing | Tests that exist + execute the code + assert nothing | Minutes | `pnpm mutate`; on-demand, not default `pnpm test` |
|
||
|
||
Each layer answers a distinct question; none replaces the others.
|
||
|
||
**2. Make `feature.manifest.ts` the single source of truth for coverage expectations.** A new `coverage:` section per feature:
|
||
|
||
```ts
|
||
coverage: {
|
||
bands: {
|
||
"entities": { statements: 100, branches: 100, functions: 100, lines: 100 },
|
||
"use-cases": { statements: 100, branches: 95, functions: 100, lines: 100 },
|
||
"controllers": { statements: 100, branches: 95, functions: 100, lines: 100 },
|
||
baseline: { statements: 80, branches: 75, functions: 80, lines: 80 },
|
||
},
|
||
mutationTargets: ["entities", "use-cases"],
|
||
}
|
||
```
|
||
|
||
Three readers consume the manifest:
|
||
|
||
- **Vitest** — `vitest.config.ts` imports the manifest's `coverage` and emits its `thresholds`. The duplicated block in 5 per-feature vitest configs goes away.
|
||
- **`assertFeatureConformance`** — reads `coverage/lcov.info` for the package at boot and asserts each band. Graceful degradation in `USE_DEV_SEED=true` (warns rather than throws when lcov is absent).
|
||
- **`pnpm coverage:diff`** — uses `baseline` for uncategorized files; stricter layer bands override per matching path glob.
|
||
|
||
This eliminates the duplication that caused the `@repo/media` drift and centralizes one decision in one place per feature.
|
||
|
||
**3. Diff coverage is cover-the-diff, not cover-the-new-code.** Every changed _executable_ line must have execution-count > 0 against the merged lcov. Modified-but-not-new lines count too — catches "agent edited code, didn't update the test." Allowlist: `*.test.ts`, `*.config.*`, `*.md`, `*.json`, `*.mjs`, plus the per-package exclude lists.
|
||
|
||
**4. Aggregate trend ships in-tree, not via SaaS.** `coverage/summary.json` is committed on merge to main. Trend readable via `git log -- coverage/summary.json`. No external service dependency; the dispatch loop can read history without a network call.
|
||
|
||
**5. Mutation testing is opt-in and narrowly scoped.** Stryker with `@stryker-mutator/vitest-runner`, runs on `entities/` + `application/use-cases/` only. Default mutation-score threshold 80% per feature (tunable per-manifest). Not part of `pnpm test`. Nightly GH Action surfaces score drift > 5%.
|
||
|
||
**6. Output format is machine-first.** `pnpm coverage:diff` emits JSON to stdout; human-readable summary to stderr. The dispatch loop reads stdout; humans read stderr or the HTML report.
|
||
|
||
## Alternatives considered
|
||
|
||
- **Codecov / Coveralls SaaS.** Polished PR comments and trend dashboards, free for OSS. Rejected as the _primary_ L2 store — adds an external dep, makes the dispatch loop dependent on a network call, and the PR-comment UX targets humans (not the primary consumer of this signal). Can be added later as gold-plating without disturbing the architecture.
|
||
- **Cover-the-new-code instead of cover-the-diff.** Lighter touch; ignores modified lines. Rejected — catches less drift. A slice that edits a use case without updating its test should fail, and cover-the-new-code wouldn't notice.
|
||
- **Keep thresholds in per-package vitest configs.** Status quo. Rejected — the 2026-05-13 audit found drift in 2 of 5 features (media had no block at all; navigation's block diverged subtly from the canonical). Manifest centralization is the only durable fix.
|
||
- **Run mutation testing in default `pnpm test`.** Rejected — Stryker on entities + use-cases takes minutes. Adding minutes to the default loop violates the constraint ("new gates must not add more than ~30s wall time"). On-demand is the right cadence; nightly catches drift.
|
||
- **Mutation testing across all layers.** Rejected for v1 — repository/controller/integration code has too many environmental dependencies to mutate cleanly. Start narrow; expand if signal is high.
|
||
- **Use ESLint or fallow for diff coverage.** Rejected — diff coverage needs runtime data (which lines actually executed), not AST or filesystem state. It belongs alongside `pnpm test`, not in `pnpm lint` or `pnpm fallow`.
|
||
- **Boot-time coverage assertion is too heavy.** Considered. Counter-argument: the assertion is `O(features × lcov-file-size)` — small numbers, ~200ms. The graceful-degradation in dev mode means contributors aren't blocked. The payoff — coverage drift caught at the same latency as TypeScript brands — justifies the machinery.
|
||
|
||
## Consequences
|
||
|
||
**Positive:**
|
||
|
||
- Every PR/task is gated on covering its own diff. Agent shipping an untested slice becomes mechanically impossible at the CI step.
|
||
- One source of truth per feature for coverage expectations. The `@repo/media`-style "no coverage block" drift can't recur.
|
||
- Trend history lives in the repo. `git log -- coverage/summary.json` answers "how has coverage moved over the last quarter?" without leaving the editor.
|
||
- Mutation testing on the highest-leverage layers (entities + use-cases — the pure-business-logic surface) raises the floor on test quality without slowing the dispatch loop.
|
||
- Machine-readable diff-coverage output integrates directly with the dispatch loop's post-task verification, completing the agent-first observability story.
|
||
- Coverage joins the 5-gate conformance philosophy as a first-class signal; ADR-020 becomes the row alongside TS brands / ESLint / boot / `pnpm conformance` / fallow / coverage.
|
||
|
||
**Negative:**
|
||
|
||
- Implementation surface is non-trivial: 6–8 stories spanning manifest schema, vitest auto-derive, two new scripts, boot-time assertion, mutation tooling, ADR + guide + glossary + generator + hook updates.
|
||
- The boot-time assertion adds a small dependency on `coverage/lcov.info` existing. Graceful degradation in dev mode handles this, but the implementation needs care.
|
||
- `coverage/summary.json` committed on merge introduces a small CI permissions surface (`contents: write`) gated to the main-branch workflow.
|
||
- Mutation testing is slow. The nightly cadence is the compromise; on-demand `pnpm mutate` is opt-in but rare in practice.
|
||
|
||
## Implementation phasing
|
||
|
||
In order (each landable independently):
|
||
|
||
1. **L0 unification** — fix `@repo/media` (missing dep + missing config block) and `@repo/navigation` (real test gaps) so every feature passes its declared bands today.
|
||
2. **Manifest schema** — extend `feature.manifest.ts` shape (Zod schema in `core-shared/conformance/`) with the `coverage:` section.
|
||
3. **Vitest auto-derive** — `vitest.config.ts` per feature imports the manifest and emits `coverage.thresholds`. Eliminates duplication.
|
||
4. **L1 diff coverage** — `scripts/coverage/diff.mjs` + `pnpm coverage:diff` script + CI gate.
|
||
5. **L2 aggregate** — `scripts/coverage/aggregate.mjs` + `pnpm coverage:aggregate` + summary.json + merge-to-main workflow.
|
||
6. **L3 mutation testing** — Stryker setup + `pnpm mutate` + nightly GH Action.
|
||
7. **Boot-time `assertFeatureConformance`** — coverage band read against lcov.
|
||
8. **Docs + generator + hook rollout** — this ADR (now landing), `docs/guides/coverage.md`, glossary entries, `pnpm turbo gen feature` template update, `.claude/hooks/prompt-context.sh` keyword group.
|
||
|
||
## Related
|
||
|
||
- ADR-006 — vertical-feature-packages
|
||
- ADR-011 — TDD foundation
|
||
- ADR-018 — audit-and-compliance (similar manifest-declared shape pattern)
|
||
- ADR-019 — sandcastle agent orchestration (the dispatch loop that reads `pnpm coverage:diff`)
|
||
- PRD `2026-05-13-coverage-architecture` — implementation seed
|