Files
agentic-dev/docs/decisions/adr-020-coverage-architecture.md
Danijel Martinek fc27eef6eb docs(coverage): sync docs to shipped state + wire sandcastle prompts
Closes the staleness gap after the 10-commit coverage epic shipped.

Doc sync (item 1 from the user's choice):
  - CLAUDE.md Quick Start: adds pnpm coverage:aggregate / coverage:diff
    / mutate to the command listing
  - CLAUDE.md: new "Sibling architecture: coverage (ADR-020)" section
    after the conformance gate table — captures the 4-layer table +
    points at docs/guides/coverage.md + ADR-020 + says agents must run
    coverage:diff before reporting complete
  - AGENTS.md preamble: now lists coverage as a parallel multi-latency
    quality system alongside conformance, with the same gate / latency
    framing
  - PRD frontmatter: status draft -> shipped + shipped date +
    shipping-commits list (all 10 SHAs anchoring the trace)
  - PRD findings table: each row gets a Resolution column citing the
    commit that closed it; conclusion text updated to past tense
  - ADR-020 implementation phasing: rewritten as a status table with
    each step linked to the commit that shipped it + Boot-time
    assertFeatureConformance explicitly marked Deferred with rationale
  - docs/guides/coverage.md: removed "Boot wiring lands in the next
    story" line; replaced with the deferral rationale + clarified
    that two readers (vitest, coverage:diff) consume the manifest

Sandcastle prompts (item 2 from the user's choice):
  - .sandcastle/implementer.prompt.md: new "Coverage gates" section
    after the conformance-gates list, requiring `pnpm test --coverage`,
    `pnpm coverage:aggregate`, and `pnpm coverage:diff` to all pass
    before reporting `complete`. Machine-readable JSON shape of
    coverage:diff documented (status / uncovered[] / kind enum), with
    explicit instructions on how to interpret each kind. Allowlist
    expansion requires justification + test.
  - .sandcastle/reviewer.prompt.md: AC coverage relabeled to "AC
    coverage (acceptance criteria, not test coverage)" to disambiguate;
    new check #7 "Coverage gates (ADR-020)" requiring CI's
    Coverage — diff (L1) step green + per-layer thresholds met +
    no silent allowlist expansion + manifest band drift detection.

Effect: future agent runs through sandcastle now treat coverage as a
first-class blocking gate, parallel to conformance. PRs no longer
discover coverage failures only via CI; the implementer is required
to check before reporting done, and the reviewer is required to
verify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 16:47:16 +02:00

117 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-020 — Agent-first coverage architecture (4 layers + manifest-driven thresholds)
**Status:** Accepted
**Date:** 2026-05-13
**Builds on:** ADR-006 (vertical-feature-packages), ADR-011 (TDD foundation)
**PRD:** docs/work/prds/2026-05-13-coverage-architecture.prd.md
## Context
ADR-011 established the TDD foundation: per-package vitest configs with V8 coverage, a coverage baseline (80/75/80/80) and stricter per-layer bands (100% on `entities/`, `application/use-cases/`, `interface-adapters/controllers/`). The ESLint conformance rule `usecase-must-have-test-file` enforces _that a test file exists_. CI runs `pnpm test -- --coverage` and uploads `**/coverage/lcov.info` as an artifact.
This leaves five gaps that matter especially in an agent-first repo:
1. **No diff-coverage gate.** A slice can ship without exercising its new lines. The ESLint rule only checks file presence; the per-package thresholds catch only large drops.
2. **No aggregate visibility.** N separate lcov files; no merged view, no trend over time.
3. **Threshold declarations are duplicated** across 5+ `vitest.config.ts` files. Drift is mechanical to spot (we did it during the 2026-05-13 brainstorm: `@repo/media` had no `coverage:` block at all; `@repo/navigation` failed its declared layer thresholds in entities + controllers).
4. **100% coverage with weak assertions is invisible.** Coverage doesn't measure whether tests would catch real regressions. Mutation testing — explicitly deferred in ADR-011 — is the next signal.
5. **Coverage data isn't agent-readable.** The HTML report serves humans; the dispatch loop has no way to ask "did my slice cover its diff?".
## Decision
**1. Adopt a 4-layer coverage architecture** mirroring the 5-gate conformance philosophy (multi-latency, machine-readable, agent-first):
| Layer | Catches | Latency | Surface |
| ---------------------------------- | ---------------------------------------------------- | ------------------ | ------------------------------------------------------------ |
| **L0** Per-layer vitest thresholds | Drift below declared bands | ~530s per package | `pnpm test --coverage` (existing) |
| **L1** Diff coverage | Changed line not exercised | ~5s after L0 | `pnpm coverage:diff`; CI gate; dispatch post-task |
| **L2** Aggregate trend | Drift across the codebase over time | ~10s | `pnpm coverage:aggregate`; committed `coverage/summary.json` |
| **L3** Mutation testing | Tests that exist + execute the code + assert nothing | Minutes | `pnpm mutate`; on-demand, not default `pnpm test` |
Each layer answers a distinct question; none replaces the others.
**2. Make `feature.manifest.ts` the single source of truth for coverage expectations.** A new `coverage:` section per feature:
```ts
coverage: {
bands: {
"entities": { statements: 100, branches: 100, functions: 100, lines: 100 },
"use-cases": { statements: 100, branches: 95, functions: 100, lines: 100 },
"controllers": { statements: 100, branches: 95, functions: 100, lines: 100 },
baseline: { statements: 80, branches: 75, functions: 80, lines: 80 },
},
mutationTargets: ["entities", "use-cases"],
}
```
Three readers consume the manifest:
- **Vitest** — `vitest.config.ts` imports the manifest's `coverage` and emits its `thresholds`. The duplicated block in 5 per-feature vitest configs goes away.
- **`assertFeatureConformance`** — reads `coverage/lcov.info` for the package at boot and asserts each band. Graceful degradation in `USE_DEV_SEED=true` (warns rather than throws when lcov is absent).
- **`pnpm coverage:diff`** — uses `baseline` for uncategorized files; stricter layer bands override per matching path glob.
This eliminates the duplication that caused the `@repo/media` drift and centralizes one decision in one place per feature.
**3. Diff coverage is cover-the-diff, not cover-the-new-code.** Every changed _executable_ line must have execution-count > 0 against the merged lcov. Modified-but-not-new lines count too — catches "agent edited code, didn't update the test." Allowlist: `*.test.ts`, `*.config.*`, `*.md`, `*.json`, `*.mjs`, plus the per-package exclude lists.
**4. Aggregate trend ships in-tree, not via SaaS.** `coverage/summary.json` is committed on merge to main. Trend readable via `git log -- coverage/summary.json`. No external service dependency; the dispatch loop can read history without a network call.
**5. Mutation testing is opt-in and narrowly scoped.** Stryker with `@stryker-mutator/vitest-runner`, runs on `entities/` + `application/use-cases/` only. Default mutation-score threshold 80% per feature (tunable per-manifest). Not part of `pnpm test`. Nightly GH Action surfaces score drift > 5%.
**6. Output format is machine-first.** `pnpm coverage:diff` emits JSON to stdout; human-readable summary to stderr. The dispatch loop reads stdout; humans read stderr or the HTML report.
## Alternatives considered
- **Codecov / Coveralls SaaS.** Polished PR comments and trend dashboards, free for OSS. Rejected as the _primary_ L2 store — adds an external dep, makes the dispatch loop dependent on a network call, and the PR-comment UX targets humans (not the primary consumer of this signal). Can be added later as gold-plating without disturbing the architecture.
- **Cover-the-new-code instead of cover-the-diff.** Lighter touch; ignores modified lines. Rejected — catches less drift. A slice that edits a use case without updating its test should fail, and cover-the-new-code wouldn't notice.
- **Keep thresholds in per-package vitest configs.** Status quo. Rejected — the 2026-05-13 audit found drift in 2 of 5 features (media had no block at all; navigation's block diverged subtly from the canonical). Manifest centralization is the only durable fix.
- **Run mutation testing in default `pnpm test`.** Rejected — Stryker on entities + use-cases takes minutes. Adding minutes to the default loop violates the constraint ("new gates must not add more than ~30s wall time"). On-demand is the right cadence; nightly catches drift.
- **Mutation testing across all layers.** Rejected for v1 — repository/controller/integration code has too many environmental dependencies to mutate cleanly. Start narrow; expand if signal is high.
- **Use ESLint or fallow for diff coverage.** Rejected — diff coverage needs runtime data (which lines actually executed), not AST or filesystem state. It belongs alongside `pnpm test`, not in `pnpm lint` or `pnpm fallow`.
- **Boot-time coverage assertion is too heavy.** Considered. Counter-argument: the assertion is `O(features × lcov-file-size)` — small numbers, ~200ms. The graceful-degradation in dev mode means contributors aren't blocked. The payoff — coverage drift caught at the same latency as TypeScript brands — justifies the machinery.
## Consequences
**Positive:**
- Every PR/task is gated on covering its own diff. Agent shipping an untested slice becomes mechanically impossible at the CI step.
- One source of truth per feature for coverage expectations. The `@repo/media`-style "no coverage block" drift can't recur.
- Trend history lives in the repo. `git log -- coverage/summary.json` answers "how has coverage moved over the last quarter?" without leaving the editor.
- Mutation testing on the highest-leverage layers (entities + use-cases — the pure-business-logic surface) raises the floor on test quality without slowing the dispatch loop.
- Machine-readable diff-coverage output integrates directly with the dispatch loop's post-task verification, completing the agent-first observability story.
- Coverage joins the 5-gate conformance philosophy as a first-class signal; ADR-020 becomes the row alongside TS brands / ESLint / boot / `pnpm conformance` / fallow / coverage.
**Negative:**
- Implementation surface is non-trivial: 68 stories spanning manifest schema, vitest auto-derive, two new scripts, boot-time assertion, mutation tooling, ADR + guide + glossary + generator + hook updates.
- The boot-time assertion adds a small dependency on `coverage/lcov.info` existing. Graceful degradation in dev mode handles this, but the implementation needs care.
- `coverage/summary.json` committed on merge introduces a small CI permissions surface (`contents: write`) gated to the main-branch workflow.
- Mutation testing is slow. The nightly cadence is the compromise; on-demand `pnpm mutate` is opt-in but rare in practice.
## Implementation phasing
Shipped as a single epic over 10 commits on 2026-05-13. Per-step state:
| # | Step | Commit | Status |
| --- | ----------------------------------------------------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | Manifest schema + helper + auth proof-of-concept | `f7baa8b` | ✅ Shipped |
| 2 | Vitest auto-derive (helper + `DEFAULT_COVERAGE_BANDS`) | `f7baa8b` + rollouts | ✅ Shipped (all 5 features wired) |
| 3 | L1 diff coverage (`scripts/coverage/diff.mjs`) | `412d994` | ✅ Shipped |
| 4 | L2 aggregate (`scripts/coverage/aggregate.mjs` + `summary.json`) | `bd5a077` | ✅ Shipped |
| 5 | CI integration (validate gate + snapshot workflow) | `39e33eb` | ✅ Shipped |
| 6 | Helper rollout to blog + marketing-pages | `15db9c4` | ✅ Shipped |
| 7 | Docs + generator + hook rollout | `4dce1df` + `f4254aa` | ✅ Shipped |
| 8 | L3 mutation testing (Stryker + nightly Action) | `6428f10` | ✅ Shipped (auth proof-of-concept; other features can add `stryker.config.json` by `extends: "@repo/core-testing/stryker.base.json"`) |
| 9 | L0 unification (close test gaps in nav + media + marketing-pages) | `bf0b049` | ✅ Shipped — all 5 features hit declared bands |
| 10 | Boot-time `assertFeatureConformance` coverage check | — | ⏸ Deferred. Duplicates L0's structural enforcement when both readers derive from the same manifest source of truth; the drift it was supposed to catch is mechanically impossible. Revisit if a concrete need emerges. |
Repo-wide state at shipping (`coverage/summary.json`): statements 95.87% / branches 88.91% / functions 100% / lines 95.87%. All five features pass their declared 100%/100%/95%/100% bands on entities/use-cases/controllers.
## Related
- ADR-006 — vertical-feature-packages
- ADR-011 — TDD foundation
- ADR-018 — audit-and-compliance (similar manifest-declared shape pattern)
- ADR-019 — sandcastle agent orchestration (the dispatch loop that reads `pnpm coverage:diff`)
- PRD `2026-05-13-coverage-architecture` — implementation seed