Files
agentic-dev-template/docs/decisions/adr-020-coverage-architecture.md
Danijel Martinek 4dce1df084 docs(coverage): ADR-020 + glossary entries + hook keyword group
Architecture record for the agent-first coverage initiative seeded by
the 2026-05-13 PRD. Captures the durable decisions:

- 4-layer architecture (L0 vitest, L1 diff, L2 aggregate, L3 mutation)
- Manifest-driven coverage band as single source of truth (vitest +
  assertFeatureConformance + pnpm coverage:diff all read from it)
- Cover-the-diff (changed lines), not cover-the-new-code
- Committed coverage/summary.json (no SaaS), trend via git log
- Mutation testing scoped to entities + use-cases, on-demand only
- Machine-first output format (JSON stdout, human stderr)

Glossary gets a new "Coverage" section with 7 entries (coverage band,
L0-L3 layers, diff coverage, mutation testing, mutation score,
coverage/summary.json), plus two relationship rows and a flagged
ambiguity for "coverage" qualifiers.

prompt-context.sh hook gets a 9th keyword group — when a prompt
mentions coverage / uncovered / lcov / mutation / stryker, the
relevant ADR + guide path are injected as additional context for
the turn.

This is the documentation layer of the coverage epic. Implementation
(manifest schema, vitest auto-derive, scripts, boot assertion,
mutation tooling) lands in subsequent stories.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:42:26 +02:00

10 KiB
Raw Blame History

ADR-020 — Agent-first coverage architecture (4 layers + manifest-driven thresholds)

Status: Accepted Date: 2026-05-13 Builds on: ADR-006 (vertical-feature-packages), ADR-011 (TDD foundation) PRD: docs/work/prds/2026-05-13-coverage-architecture.prd.md

Context

ADR-011 established the TDD foundation: per-package vitest configs with V8 coverage, a coverage baseline (80/75/80/80) and stricter per-layer bands (100% on entities/, application/use-cases/, interface-adapters/controllers/). The ESLint conformance rule usecase-must-have-test-file enforces that a test file exists. CI runs pnpm test -- --coverage and uploads **/coverage/lcov.info as an artifact.

This leaves five gaps that matter especially in an agent-first repo:

  1. No diff-coverage gate. A slice can ship without exercising its new lines. The ESLint rule only checks file presence; the per-package thresholds catch only large drops.
  2. No aggregate visibility. N separate lcov files; no merged view, no trend over time.
  3. Threshold declarations are duplicated across 5+ vitest.config.ts files. Drift is mechanical to spot (we did it during the 2026-05-13 brainstorm: @repo/media had no coverage: block at all; @repo/navigation failed its declared layer thresholds in entities + controllers).
  4. 100% coverage with weak assertions is invisible. Coverage doesn't measure whether tests would catch real regressions. Mutation testing — explicitly deferred in ADR-011 — is the next signal.
  5. Coverage data isn't agent-readable. The HTML report serves humans; the dispatch loop has no way to ask "did my slice cover its diff?".

Decision

1. Adopt a 4-layer coverage architecture mirroring the 5-gate conformance philosophy (multi-latency, machine-readable, agent-first):

Layer Catches Latency Surface
L0 Per-layer vitest thresholds Drift below declared bands ~530s per package pnpm test --coverage (existing)
L1 Diff coverage Changed line not exercised ~5s after L0 pnpm coverage:diff; CI gate; dispatch post-task
L2 Aggregate trend Drift across the codebase over time ~10s pnpm coverage:aggregate; committed coverage/summary.json
L3 Mutation testing Tests that exist + execute the code + assert nothing Minutes pnpm mutate; on-demand, not default pnpm test

Each layer answers a distinct question; none replaces the others.

2. Make feature.manifest.ts the single source of truth for coverage expectations. A new coverage: section per feature:

coverage: {
  bands: {
    "entities":    { statements: 100, branches: 100, functions: 100, lines: 100 },
    "use-cases":   { statements: 100, branches:  95, functions: 100, lines: 100 },
    "controllers": { statements: 100, branches:  95, functions: 100, lines: 100 },
    baseline:      { statements:  80, branches:  75, functions:  80, lines:  80 },
  },
  mutationTargets: ["entities", "use-cases"],
}

Three readers consume the manifest:

  • Vitestvitest.config.ts imports the manifest's coverage and emits its thresholds. The duplicated block in 5 per-feature vitest configs goes away.
  • assertFeatureConformance — reads coverage/lcov.info for the package at boot and asserts each band. Graceful degradation in USE_DEV_SEED=true (warns rather than throws when lcov is absent).
  • pnpm coverage:diff — uses baseline for uncategorized files; stricter layer bands override per matching path glob.

This eliminates the duplication that caused the @repo/media drift and centralizes one decision in one place per feature.

3. Diff coverage is cover-the-diff, not cover-the-new-code. Every changed executable line must have execution-count > 0 against the merged lcov. Modified-but-not-new lines count too — catches "agent edited code, didn't update the test." Allowlist: *.test.ts, *.config.*, *.md, *.json, *.mjs, plus the per-package exclude lists.

4. Aggregate trend ships in-tree, not via SaaS. coverage/summary.json is committed on merge to main. Trend readable via git log -- coverage/summary.json. No external service dependency; the dispatch loop can read history without a network call.

5. Mutation testing is opt-in and narrowly scoped. Stryker with @stryker-mutator/vitest-runner, runs on entities/ + application/use-cases/ only. Default mutation-score threshold 80% per feature (tunable per-manifest). Not part of pnpm test. Nightly GH Action surfaces score drift > 5%.

6. Output format is machine-first. pnpm coverage:diff emits JSON to stdout; human-readable summary to stderr. The dispatch loop reads stdout; humans read stderr or the HTML report.

Alternatives considered

  • Codecov / Coveralls SaaS. Polished PR comments and trend dashboards, free for OSS. Rejected as the primary L2 store — adds an external dep, makes the dispatch loop dependent on a network call, and the PR-comment UX targets humans (not the primary consumer of this signal). Can be added later as gold-plating without disturbing the architecture.
  • Cover-the-new-code instead of cover-the-diff. Lighter touch; ignores modified lines. Rejected — catches less drift. A slice that edits a use case without updating its test should fail, and cover-the-new-code wouldn't notice.
  • Keep thresholds in per-package vitest configs. Status quo. Rejected — the 2026-05-13 audit found drift in 2 of 5 features (media had no block at all; navigation's block diverged subtly from the canonical). Manifest centralization is the only durable fix.
  • Run mutation testing in default pnpm test. Rejected — Stryker on entities + use-cases takes minutes. Adding minutes to the default loop violates the constraint ("new gates must not add more than ~30s wall time"). On-demand is the right cadence; nightly catches drift.
  • Mutation testing across all layers. Rejected for v1 — repository/controller/integration code has too many environmental dependencies to mutate cleanly. Start narrow; expand if signal is high.
  • Use ESLint or fallow for diff coverage. Rejected — diff coverage needs runtime data (which lines actually executed), not AST or filesystem state. It belongs alongside pnpm test, not in pnpm lint or pnpm fallow.
  • Boot-time coverage assertion is too heavy. Considered. Counter-argument: the assertion is O(features × lcov-file-size) — small numbers, ~200ms. The graceful-degradation in dev mode means contributors aren't blocked. The payoff — coverage drift caught at the same latency as TypeScript brands — justifies the machinery.

Consequences

Positive:

  • Every PR/task is gated on covering its own diff. Agent shipping an untested slice becomes mechanically impossible at the CI step.
  • One source of truth per feature for coverage expectations. The @repo/media-style "no coverage block" drift can't recur.
  • Trend history lives in the repo. git log -- coverage/summary.json answers "how has coverage moved over the last quarter?" without leaving the editor.
  • Mutation testing on the highest-leverage layers (entities + use-cases — the pure-business-logic surface) raises the floor on test quality without slowing the dispatch loop.
  • Machine-readable diff-coverage output integrates directly with the dispatch loop's post-task verification, completing the agent-first observability story.
  • Coverage joins the 5-gate conformance philosophy as a first-class signal; ADR-020 becomes the row alongside TS brands / ESLint / boot / pnpm conformance / fallow / coverage.

Negative:

  • Implementation surface is non-trivial: 68 stories spanning manifest schema, vitest auto-derive, two new scripts, boot-time assertion, mutation tooling, ADR + guide + glossary + generator + hook updates.
  • The boot-time assertion adds a small dependency on coverage/lcov.info existing. Graceful degradation in dev mode handles this, but the implementation needs care.
  • coverage/summary.json committed on merge introduces a small CI permissions surface (contents: write) gated to the main-branch workflow.
  • Mutation testing is slow. The nightly cadence is the compromise; on-demand pnpm mutate is opt-in but rare in practice.

Implementation phasing

In order (each landable independently):

  1. L0 unification — fix @repo/media (missing dep + missing config block) and @repo/navigation (real test gaps) so every feature passes its declared bands today.
  2. Manifest schema — extend feature.manifest.ts shape (Zod schema in core-shared/conformance/) with the coverage: section.
  3. Vitest auto-derivevitest.config.ts per feature imports the manifest and emits coverage.thresholds. Eliminates duplication.
  4. L1 diff coveragescripts/coverage/diff.mjs + pnpm coverage:diff script + CI gate.
  5. L2 aggregatescripts/coverage/aggregate.mjs + pnpm coverage:aggregate + summary.json + merge-to-main workflow.
  6. L3 mutation testing — Stryker setup + pnpm mutate + nightly GH Action.
  7. Boot-time assertFeatureConformance — coverage band read against lcov.
  8. Docs + generator + hook rollout — this ADR (now landing), docs/guides/coverage.md, glossary entries, pnpm turbo gen feature template update, .claude/hooks/prompt-context.sh keyword group.
  • ADR-006 — vertical-feature-packages
  • ADR-011 — TDD foundation
  • ADR-018 — audit-and-compliance (similar manifest-declared shape pattern)
  • ADR-019 — sandcastle agent orchestration (the dispatch loop that reads pnpm coverage:diff)
  • PRD 2026-05-13-coverage-architecture — implementation seed