Files
agentic-dev/docs/decisions/adr-020-coverage-architecture.md
Danijel Martinek 4dce1df084 docs(coverage): ADR-020 + glossary entries + hook keyword group
Architecture record for the agent-first coverage initiative seeded by
the 2026-05-13 PRD. Captures the durable decisions:

- 4-layer architecture (L0 vitest, L1 diff, L2 aggregate, L3 mutation)
- Manifest-driven coverage band as single source of truth (vitest +
  assertFeatureConformance + pnpm coverage:diff all read from it)
- Cover-the-diff (changed lines), not cover-the-new-code
- Committed coverage/summary.json (no SaaS), trend via git log
- Mutation testing scoped to entities + use-cases, on-demand only
- Machine-first output format (JSON stdout, human stderr)

Glossary gets a new "Coverage" section with 7 entries (coverage band,
L0-L3 layers, diff coverage, mutation testing, mutation score,
coverage/summary.json), plus two relationship rows and a flagged
ambiguity for "coverage" qualifiers.

prompt-context.sh hook gets a 9th keyword group — when a prompt
mentions coverage / uncovered / lcov / mutation / stryker, the
relevant ADR + guide path are injected as additional context for
the turn.

This is the documentation layer of the coverage epic. Implementation
(manifest schema, vitest auto-derive, scripts, boot assertion,
mutation tooling) lands in subsequent stories.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:42:26 +02:00

111 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-020 — Agent-first coverage architecture (4 layers + manifest-driven thresholds)
**Status:** Accepted
**Date:** 2026-05-13
**Builds on:** ADR-006 (vertical-feature-packages), ADR-011 (TDD foundation)
**PRD:** docs/work/prds/2026-05-13-coverage-architecture.prd.md
## Context
ADR-011 established the TDD foundation: per-package vitest configs with V8 coverage, a coverage baseline (80/75/80/80) and stricter per-layer bands (100% on `entities/`, `application/use-cases/`, `interface-adapters/controllers/`). The ESLint conformance rule `usecase-must-have-test-file` enforces _that a test file exists_. CI runs `pnpm test -- --coverage` and uploads `**/coverage/lcov.info` as an artifact.
This leaves five gaps that matter especially in an agent-first repo:
1. **No diff-coverage gate.** A slice can ship without exercising its new lines. The ESLint rule only checks file presence; the per-package thresholds catch only large drops.
2. **No aggregate visibility.** N separate lcov files; no merged view, no trend over time.
3. **Threshold declarations are duplicated** across 5+ `vitest.config.ts` files. Drift is mechanical to spot (we did it during the 2026-05-13 brainstorm: `@repo/media` had no `coverage:` block at all; `@repo/navigation` failed its declared layer thresholds in entities + controllers).
4. **100% coverage with weak assertions is invisible.** Coverage doesn't measure whether tests would catch real regressions. Mutation testing — explicitly deferred in ADR-011 — is the next signal.
5. **Coverage data isn't agent-readable.** The HTML report serves humans; the dispatch loop has no way to ask "did my slice cover its diff?".
## Decision
**1. Adopt a 4-layer coverage architecture** mirroring the 5-gate conformance philosophy (multi-latency, machine-readable, agent-first):
| Layer | Catches | Latency | Surface |
| ---------------------------------- | ---------------------------------------------------- | ------------------ | ------------------------------------------------------------ |
| **L0** Per-layer vitest thresholds | Drift below declared bands | ~530s per package | `pnpm test --coverage` (existing) |
| **L1** Diff coverage | Changed line not exercised | ~5s after L0 | `pnpm coverage:diff`; CI gate; dispatch post-task |
| **L2** Aggregate trend | Drift across the codebase over time | ~10s | `pnpm coverage:aggregate`; committed `coverage/summary.json` |
| **L3** Mutation testing | Tests that exist + execute the code + assert nothing | Minutes | `pnpm mutate`; on-demand, not default `pnpm test` |
Each layer answers a distinct question; none replaces the others.
**2. Make `feature.manifest.ts` the single source of truth for coverage expectations.** A new `coverage:` section per feature:
```ts
coverage: {
bands: {
"entities": { statements: 100, branches: 100, functions: 100, lines: 100 },
"use-cases": { statements: 100, branches: 95, functions: 100, lines: 100 },
"controllers": { statements: 100, branches: 95, functions: 100, lines: 100 },
baseline: { statements: 80, branches: 75, functions: 80, lines: 80 },
},
mutationTargets: ["entities", "use-cases"],
}
```
Three readers consume the manifest:
- **Vitest** — `vitest.config.ts` imports the manifest's `coverage` and emits its `thresholds`. The duplicated block in 5 per-feature vitest configs goes away.
- **`assertFeatureConformance`** — reads `coverage/lcov.info` for the package at boot and asserts each band. Graceful degradation in `USE_DEV_SEED=true` (warns rather than throws when lcov is absent).
- **`pnpm coverage:diff`** — uses `baseline` for uncategorized files; stricter layer bands override per matching path glob.
This eliminates the duplication that caused the `@repo/media` drift and centralizes one decision in one place per feature.
**3. Diff coverage is cover-the-diff, not cover-the-new-code.** Every changed _executable_ line must have execution-count > 0 against the merged lcov. Modified-but-not-new lines count too — catches "agent edited code, didn't update the test." Allowlist: `*.test.ts`, `*.config.*`, `*.md`, `*.json`, `*.mjs`, plus the per-package exclude lists.
**4. Aggregate trend ships in-tree, not via SaaS.** `coverage/summary.json` is committed on merge to main. Trend readable via `git log -- coverage/summary.json`. No external service dependency; the dispatch loop can read history without a network call.
**5. Mutation testing is opt-in and narrowly scoped.** Stryker with `@stryker-mutator/vitest-runner`, runs on `entities/` + `application/use-cases/` only. Default mutation-score threshold 80% per feature (tunable per-manifest). Not part of `pnpm test`. Nightly GH Action surfaces score drift > 5%.
**6. Output format is machine-first.** `pnpm coverage:diff` emits JSON to stdout; human-readable summary to stderr. The dispatch loop reads stdout; humans read stderr or the HTML report.
## Alternatives considered
- **Codecov / Coveralls SaaS.** Polished PR comments and trend dashboards, free for OSS. Rejected as the _primary_ L2 store — adds an external dep, makes the dispatch loop dependent on a network call, and the PR-comment UX targets humans (not the primary consumer of this signal). Can be added later as gold-plating without disturbing the architecture.
- **Cover-the-new-code instead of cover-the-diff.** Lighter touch; ignores modified lines. Rejected — catches less drift. A slice that edits a use case without updating its test should fail, and cover-the-new-code wouldn't notice.
- **Keep thresholds in per-package vitest configs.** Status quo. Rejected — the 2026-05-13 audit found drift in 2 of 5 features (media had no block at all; navigation's block diverged subtly from the canonical). Manifest centralization is the only durable fix.
- **Run mutation testing in default `pnpm test`.** Rejected — Stryker on entities + use-cases takes minutes. Adding minutes to the default loop violates the constraint ("new gates must not add more than ~30s wall time"). On-demand is the right cadence; nightly catches drift.
- **Mutation testing across all layers.** Rejected for v1 — repository/controller/integration code has too many environmental dependencies to mutate cleanly. Start narrow; expand if signal is high.
- **Use ESLint or fallow for diff coverage.** Rejected — diff coverage needs runtime data (which lines actually executed), not AST or filesystem state. It belongs alongside `pnpm test`, not in `pnpm lint` or `pnpm fallow`.
- **Boot-time coverage assertion is too heavy.** Considered. Counter-argument: the assertion is `O(features × lcov-file-size)` — small numbers, ~200ms. The graceful-degradation in dev mode means contributors aren't blocked. The payoff — coverage drift caught at the same latency as TypeScript brands — justifies the machinery.
## Consequences
**Positive:**
- Every PR/task is gated on covering its own diff. Agent shipping an untested slice becomes mechanically impossible at the CI step.
- One source of truth per feature for coverage expectations. The `@repo/media`-style "no coverage block" drift can't recur.
- Trend history lives in the repo. `git log -- coverage/summary.json` answers "how has coverage moved over the last quarter?" without leaving the editor.
- Mutation testing on the highest-leverage layers (entities + use-cases — the pure-business-logic surface) raises the floor on test quality without slowing the dispatch loop.
- Machine-readable diff-coverage output integrates directly with the dispatch loop's post-task verification, completing the agent-first observability story.
- Coverage joins the 5-gate conformance philosophy as a first-class signal; ADR-020 becomes the row alongside TS brands / ESLint / boot / `pnpm conformance` / fallow / coverage.
**Negative:**
- Implementation surface is non-trivial: 68 stories spanning manifest schema, vitest auto-derive, two new scripts, boot-time assertion, mutation tooling, ADR + guide + glossary + generator + hook updates.
- The boot-time assertion adds a small dependency on `coverage/lcov.info` existing. Graceful degradation in dev mode handles this, but the implementation needs care.
- `coverage/summary.json` committed on merge introduces a small CI permissions surface (`contents: write`) gated to the main-branch workflow.
- Mutation testing is slow. The nightly cadence is the compromise; on-demand `pnpm mutate` is opt-in but rare in practice.
## Implementation phasing
In order (each landable independently):
1. **L0 unification** — fix `@repo/media` (missing dep + missing config block) and `@repo/navigation` (real test gaps) so every feature passes its declared bands today.
2. **Manifest schema** — extend `feature.manifest.ts` shape (Zod schema in `core-shared/conformance/`) with the `coverage:` section.
3. **Vitest auto-derive**`vitest.config.ts` per feature imports the manifest and emits `coverage.thresholds`. Eliminates duplication.
4. **L1 diff coverage**`scripts/coverage/diff.mjs` + `pnpm coverage:diff` script + CI gate.
5. **L2 aggregate**`scripts/coverage/aggregate.mjs` + `pnpm coverage:aggregate` + summary.json + merge-to-main workflow.
6. **L3 mutation testing** — Stryker setup + `pnpm mutate` + nightly GH Action.
7. **Boot-time `assertFeatureConformance`** — coverage band read against lcov.
8. **Docs + generator + hook rollout** — this ADR (now landing), `docs/guides/coverage.md`, glossary entries, `pnpm turbo gen feature` template update, `.claude/hooks/prompt-context.sh` keyword group.
## Related
- ADR-006 — vertical-feature-packages
- ADR-011 — TDD foundation
- ADR-018 — audit-and-compliance (similar manifest-declared shape pattern)
- ADR-019 — sandcastle agent orchestration (the dispatch loop that reads `pnpm coverage:diff`)
- PRD `2026-05-13-coverage-architecture` — implementation seed