Closes the staleness gap after the 10-commit coverage epic shipped.
Doc sync (item 1 from the user's choice):
- CLAUDE.md Quick Start: adds pnpm coverage:aggregate / coverage:diff
/ mutate to the command listing
- CLAUDE.md: new "Sibling architecture: coverage (ADR-020)" section
after the conformance gate table — captures the 4-layer table +
points at docs/guides/coverage.md + ADR-020 + says agents must run
coverage:diff before reporting complete
- AGENTS.md preamble: now lists coverage as a parallel multi-latency
quality system alongside conformance, with the same gate / latency
framing
- PRD frontmatter: status draft -> shipped + shipped date +
shipping-commits list (all 10 SHAs anchoring the trace)
- PRD findings table: each row gets a Resolution column citing the
commit that closed it; conclusion text updated to past tense
- ADR-020 implementation phasing: rewritten as a status table with
each step linked to the commit that shipped it + Boot-time
assertFeatureConformance explicitly marked Deferred with rationale
- docs/guides/coverage.md: removed "Boot wiring lands in the next
story" line; replaced with the deferral rationale + clarified
that two readers (vitest, coverage:diff) consume the manifest
Sandcastle prompts (item 2 from the user's choice):
- .sandcastle/implementer.prompt.md: new "Coverage gates" section
after the conformance-gates list, requiring `pnpm test --coverage`,
`pnpm coverage:aggregate`, and `pnpm coverage:diff` to all pass
before reporting `complete`. Machine-readable JSON shape of
coverage:diff documented (status / uncovered[] / kind enum), with
explicit instructions on how to interpret each kind. Allowlist
expansion requires justification + test.
- .sandcastle/reviewer.prompt.md: AC coverage relabeled to "AC
coverage (acceptance criteria, not test coverage)" to disambiguate;
new check #7 "Coverage gates (ADR-020)" requiring CI's
Coverage — diff (L1) step green + per-layer thresholds met +
no silent allowlist expansion + manifest band drift detection.
Effect: future agent runs through sandcastle now treat coverage as a
first-class blocking gate, parallel to conformance. PRs no longer
discover coverage failures only via CI; the implementer is required
to check before reporting done, and the reviewer is required to
verify.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
117 lines
13 KiB
Markdown
117 lines
13 KiB
Markdown
# ADR-020 — Agent-first coverage architecture (4 layers + manifest-driven thresholds)
|
||
|
||
**Status:** Accepted
|
||
**Date:** 2026-05-13
|
||
**Builds on:** ADR-006 (vertical-feature-packages), ADR-011 (TDD foundation)
|
||
**PRD:** docs/work/prds/2026-05-13-coverage-architecture.prd.md
|
||
|
||
## Context
|
||
|
||
ADR-011 established the TDD foundation: per-package vitest configs with V8 coverage, a coverage baseline (80/75/80/80) and stricter per-layer bands (100% on `entities/`, `application/use-cases/`, `interface-adapters/controllers/`). The ESLint conformance rule `usecase-must-have-test-file` enforces _that a test file exists_. CI runs `pnpm test -- --coverage` and uploads `**/coverage/lcov.info` as an artifact.
|
||
|
||
This leaves five gaps that matter especially in an agent-first repo:
|
||
|
||
1. **No diff-coverage gate.** A slice can ship without exercising its new lines. The ESLint rule only checks file presence; the per-package thresholds catch only large drops.
|
||
2. **No aggregate visibility.** N separate lcov files; no merged view, no trend over time.
|
||
3. **Threshold declarations are duplicated** across 5+ `vitest.config.ts` files. Drift is mechanical to spot (we did it during the 2026-05-13 brainstorm: `@repo/media` had no `coverage:` block at all; `@repo/navigation` failed its declared layer thresholds in entities + controllers).
|
||
4. **100% coverage with weak assertions is invisible.** Coverage doesn't measure whether tests would catch real regressions. Mutation testing — explicitly deferred in ADR-011 — is the next signal.
|
||
5. **Coverage data isn't agent-readable.** The HTML report serves humans; the dispatch loop has no way to ask "did my slice cover its diff?".
|
||
|
||
## Decision
|
||
|
||
**1. Adopt a 4-layer coverage architecture** mirroring the 5-gate conformance philosophy (multi-latency, machine-readable, agent-first):
|
||
|
||
| Layer | Catches | Latency | Surface |
|
||
| ---------------------------------- | ---------------------------------------------------- | ------------------ | ------------------------------------------------------------ |
|
||
| **L0** Per-layer vitest thresholds | Drift below declared bands | ~5–30s per package | `pnpm test --coverage` (existing) |
|
||
| **L1** Diff coverage | Changed line not exercised | ~5s after L0 | `pnpm coverage:diff`; CI gate; dispatch post-task |
|
||
| **L2** Aggregate trend | Drift across the codebase over time | ~10s | `pnpm coverage:aggregate`; committed `coverage/summary.json` |
|
||
| **L3** Mutation testing | Tests that exist + execute the code + assert nothing | Minutes | `pnpm mutate`; on-demand, not default `pnpm test` |
|
||
|
||
Each layer answers a distinct question; none replaces the others.
|
||
|
||
**2. Make `feature.manifest.ts` the single source of truth for coverage expectations.** A new `coverage:` section per feature:
|
||
|
||
```ts
|
||
coverage: {
|
||
bands: {
|
||
"entities": { statements: 100, branches: 100, functions: 100, lines: 100 },
|
||
"use-cases": { statements: 100, branches: 95, functions: 100, lines: 100 },
|
||
"controllers": { statements: 100, branches: 95, functions: 100, lines: 100 },
|
||
baseline: { statements: 80, branches: 75, functions: 80, lines: 80 },
|
||
},
|
||
mutationTargets: ["entities", "use-cases"],
|
||
}
|
||
```
|
||
|
||
Three readers consume the manifest:
|
||
|
||
- **Vitest** — `vitest.config.ts` imports the manifest's `coverage` and emits its `thresholds`. The duplicated block in 5 per-feature vitest configs goes away.
|
||
- **`assertFeatureConformance`** — reads `coverage/lcov.info` for the package at boot and asserts each band. Graceful degradation in `USE_DEV_SEED=true` (warns rather than throws when lcov is absent).
|
||
- **`pnpm coverage:diff`** — uses `baseline` for uncategorized files; stricter layer bands override per matching path glob.
|
||
|
||
This eliminates the duplication that caused the `@repo/media` drift and centralizes one decision in one place per feature.
|
||
|
||
**3. Diff coverage is cover-the-diff, not cover-the-new-code.** Every changed _executable_ line must have execution-count > 0 against the merged lcov. Modified-but-not-new lines count too — catches "agent edited code, didn't update the test." Allowlist: `*.test.ts`, `*.config.*`, `*.md`, `*.json`, `*.mjs`, plus the per-package exclude lists.
|
||
|
||
**4. Aggregate trend ships in-tree, not via SaaS.** `coverage/summary.json` is committed on merge to main. Trend readable via `git log -- coverage/summary.json`. No external service dependency; the dispatch loop can read history without a network call.
|
||
|
||
**5. Mutation testing is opt-in and narrowly scoped.** Stryker with `@stryker-mutator/vitest-runner`, runs on `entities/` + `application/use-cases/` only. Default mutation-score threshold 80% per feature (tunable per-manifest). Not part of `pnpm test`. Nightly GH Action surfaces score drift > 5%.
|
||
|
||
**6. Output format is machine-first.** `pnpm coverage:diff` emits JSON to stdout; human-readable summary to stderr. The dispatch loop reads stdout; humans read stderr or the HTML report.
|
||
|
||
## Alternatives considered
|
||
|
||
- **Codecov / Coveralls SaaS.** Polished PR comments and trend dashboards, free for OSS. Rejected as the _primary_ L2 store — adds an external dep, makes the dispatch loop dependent on a network call, and the PR-comment UX targets humans (not the primary consumer of this signal). Can be added later as gold-plating without disturbing the architecture.
|
||
- **Cover-the-new-code instead of cover-the-diff.** Lighter touch; ignores modified lines. Rejected — catches less drift. A slice that edits a use case without updating its test should fail, and cover-the-new-code wouldn't notice.
|
||
- **Keep thresholds in per-package vitest configs.** Status quo. Rejected — the 2026-05-13 audit found drift in 2 of 5 features (media had no block at all; navigation's block diverged subtly from the canonical). Manifest centralization is the only durable fix.
|
||
- **Run mutation testing in default `pnpm test`.** Rejected — Stryker on entities + use-cases takes minutes. Adding minutes to the default loop violates the constraint ("new gates must not add more than ~30s wall time"). On-demand is the right cadence; nightly catches drift.
|
||
- **Mutation testing across all layers.** Rejected for v1 — repository/controller/integration code has too many environmental dependencies to mutate cleanly. Start narrow; expand if signal is high.
|
||
- **Use ESLint or fallow for diff coverage.** Rejected — diff coverage needs runtime data (which lines actually executed), not AST or filesystem state. It belongs alongside `pnpm test`, not in `pnpm lint` or `pnpm fallow`.
|
||
- **Boot-time coverage assertion is too heavy.** Considered. Counter-argument: the assertion is `O(features × lcov-file-size)` — small numbers, ~200ms. The graceful-degradation in dev mode means contributors aren't blocked. The payoff — coverage drift caught at the same latency as TypeScript brands — justifies the machinery.
|
||
|
||
## Consequences
|
||
|
||
**Positive:**
|
||
|
||
- Every PR/task is gated on covering its own diff. Agent shipping an untested slice becomes mechanically impossible at the CI step.
|
||
- One source of truth per feature for coverage expectations. The `@repo/media`-style "no coverage block" drift can't recur.
|
||
- Trend history lives in the repo. `git log -- coverage/summary.json` answers "how has coverage moved over the last quarter?" without leaving the editor.
|
||
- Mutation testing on the highest-leverage layers (entities + use-cases — the pure-business-logic surface) raises the floor on test quality without slowing the dispatch loop.
|
||
- Machine-readable diff-coverage output integrates directly with the dispatch loop's post-task verification, completing the agent-first observability story.
|
||
- Coverage joins the 5-gate conformance philosophy as a first-class signal; ADR-020 becomes the row alongside TS brands / ESLint / boot / `pnpm conformance` / fallow / coverage.
|
||
|
||
**Negative:**
|
||
|
||
- Implementation surface is non-trivial: 6–8 stories spanning manifest schema, vitest auto-derive, two new scripts, boot-time assertion, mutation tooling, ADR + guide + glossary + generator + hook updates.
|
||
- The boot-time assertion adds a small dependency on `coverage/lcov.info` existing. Graceful degradation in dev mode handles this, but the implementation needs care.
|
||
- `coverage/summary.json` committed on merge introduces a small CI permissions surface (`contents: write`) gated to the main-branch workflow.
|
||
- Mutation testing is slow. The nightly cadence is the compromise; on-demand `pnpm mutate` is opt-in but rare in practice.
|
||
|
||
## Implementation phasing
|
||
|
||
Shipped as a single epic over 10 commits on 2026-05-13. Per-step state:
|
||
|
||
| # | Step | Commit | Status |
|
||
| --- | ----------------------------------------------------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||
| 1 | Manifest schema + helper + auth proof-of-concept | `f7baa8b` | ✅ Shipped |
|
||
| 2 | Vitest auto-derive (helper + `DEFAULT_COVERAGE_BANDS`) | `f7baa8b` + rollouts | ✅ Shipped (all 5 features wired) |
|
||
| 3 | L1 diff coverage (`scripts/coverage/diff.mjs`) | `412d994` | ✅ Shipped |
|
||
| 4 | L2 aggregate (`scripts/coverage/aggregate.mjs` + `summary.json`) | `bd5a077` | ✅ Shipped |
|
||
| 5 | CI integration (validate gate + snapshot workflow) | `39e33eb` | ✅ Shipped |
|
||
| 6 | Helper rollout to blog + marketing-pages | `15db9c4` | ✅ Shipped |
|
||
| 7 | Docs + generator + hook rollout | `4dce1df` + `f4254aa` | ✅ Shipped |
|
||
| 8 | L3 mutation testing (Stryker + nightly Action) | `6428f10` | ✅ Shipped (auth proof-of-concept; other features can add `stryker.config.json` by `extends: "@repo/core-testing/stryker.base.json"`) |
|
||
| 9 | L0 unification (close test gaps in nav + media + marketing-pages) | `bf0b049` | ✅ Shipped — all 5 features hit declared bands |
|
||
| 10 | Boot-time `assertFeatureConformance` coverage check | — | ⏸ Deferred. Duplicates L0's structural enforcement when both readers derive from the same manifest source of truth; the drift it was supposed to catch is mechanically impossible. Revisit if a concrete need emerges. |
|
||
|
||
Repo-wide state at shipping (`coverage/summary.json`): statements 95.87% / branches 88.91% / functions 100% / lines 95.87%. All five features pass their declared 100%/100%/95%/100% bands on entities/use-cases/controllers.
|
||
|
||
## Related
|
||
|
||
- ADR-006 — vertical-feature-packages
|
||
- ADR-011 — TDD foundation
|
||
- ADR-018 — audit-and-compliance (similar manifest-declared shape pattern)
|
||
- ADR-019 — sandcastle agent orchestration (the dispatch loop that reads `pnpm coverage:diff`)
|
||
- PRD `2026-05-13-coverage-architecture` — implementation seed
|