Files
agentic-dev/docs/decisions/adr-020-coverage-architecture.md
Danijel Martinek bae4b66fa4 refactor(work): drop date prefixes + move _state.json into _system/
Convention shift: epic folders + PRD filenames + frontmatter id
fields are now bare slugs. The created: timestamp (Phase 2) carries
the date; folder names don't repeat it. A future <task-id>-<slug>
shape (e.g. ClickUp) lands cleanly when that integration ships.

Renames (git mv preserves history):
- docs/work/2026-05-13-binder-wrap-helper/
    -> docs/work/binder-wrap-helper/
- docs/work/2026-05-14-library-evaluation-policy/
    -> docs/work/library-evaluation-policy/
- docs/work/2026-05-14-ci-security-and-supply-chain/
    -> docs/work/ci-security-and-supply-chain/
- docs/work/prds/2026-05-13-binder-wrap-helper.prd.md
    -> docs/work/prds/binder-wrap-helper.prd.md
- docs/work/prds/2026-05-13-coverage-architecture.prd.md
    -> docs/work/prds/coverage-architecture.prd.md
- docs/work/prds/2026-05-14-library-evaluation-policy.prd.md
    -> docs/work/prds/library-evaluation-policy.prd.md
- docs/work/prds/2026-05-14-ci-security-and-supply-chain.prd.md
    -> docs/work/prds/ci-security-and-supply-chain.prd.md

Frontmatter updates inside the renamed files: epic id, epic prd,
story epic, PRD id, PRD builds-on all drop date prefixes.

System folder + state file move:
- New docs/work/_system/ holds framework-managed state.
- docs/work/_state.json -> docs/work/_system/_state.json.
- state-builder.mjs adds _system to SKIP_FOLDERS.
- cli.mjs + state-sync-guard.mjs + .husky/pre-commit point at the
  new path.

template-reset-v1 epic deleted entirely (one-off cleanup epic from
the pre-date-convention era; status was already done).

Generator-template updates (so new artifacts ship in the right
shape):
- .sandcastle/decomposer.prompt.md emits bare-slug folder names +
  ISO created: timestamp.
- .claude/skills/to-prd/SKILL.md template uses bare-slug filename +
  bare-slug id field + ISO created: timestamp.

Doc reference updates: glossary, runbook, agent-first-workflow-
and-conformance, reviewer prompt, ADR-020, ADR-022, ADR-023 all
point at the new paths/slugs.
2026-05-14 21:16:51 +02:00

117 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-020 — Agent-first coverage architecture (4 layers + manifest-driven thresholds)
**Status:** Accepted
**Date:** 2026-05-13
**Builds on:** ADR-006 (vertical-feature-packages), ADR-011 (TDD foundation)
**PRD:** docs/work/prds/coverage-architecture.prd.md
## Context
ADR-011 established the TDD foundation: per-package vitest configs with V8 coverage, a coverage baseline (80/75/80/80) and stricter per-layer bands (100% on `entities/`, `application/use-cases/`, `interface-adapters/controllers/`). The ESLint conformance rule `usecase-must-have-test-file` enforces _that a test file exists_. CI runs `pnpm test -- --coverage` and uploads `**/coverage/lcov.info` as an artifact.
This leaves five gaps that matter especially in an agent-first repo:
1. **No diff-coverage gate.** A slice can ship without exercising its new lines. The ESLint rule only checks file presence; the per-package thresholds catch only large drops.
2. **No aggregate visibility.** N separate lcov files; no merged view, no trend over time.
3. **Threshold declarations are duplicated** across 5+ `vitest.config.ts` files. Drift is mechanical to spot (we did it during the 2026-05-13 brainstorm: `@repo/media` had no `coverage:` block at all; `@repo/navigation` failed its declared layer thresholds in entities + controllers).
4. **100% coverage with weak assertions is invisible.** Coverage doesn't measure whether tests would catch real regressions. Mutation testing — explicitly deferred in ADR-011 — is the next signal.
5. **Coverage data isn't agent-readable.** The HTML report serves humans; the dispatch loop has no way to ask "did my slice cover its diff?".
## Decision
**1. Adopt a 4-layer coverage architecture** mirroring the 5-gate conformance philosophy (multi-latency, machine-readable, agent-first):
| Layer | Catches | Latency | Surface |
| ---------------------------------- | ---------------------------------------------------- | ------------------ | ------------------------------------------------------------ |
| **L0** Per-layer vitest thresholds | Drift below declared bands | ~530s per package | `pnpm test --coverage` (existing) |
| **L1** Diff coverage | Changed line not exercised | ~5s after L0 | `pnpm coverage:diff`; CI gate; dispatch post-task |
| **L2** Aggregate trend | Drift across the codebase over time | ~10s | `pnpm coverage:aggregate`; committed `coverage/summary.json` |
| **L3** Mutation testing | Tests that exist + execute the code + assert nothing | Minutes | `pnpm mutate`; on-demand, not default `pnpm test` |
Each layer answers a distinct question; none replaces the others.
**2. Make `feature.manifest.ts` the single source of truth for coverage expectations.** A new `coverage:` section per feature:
```ts
coverage: {
bands: {
"entities": { statements: 100, branches: 100, functions: 100, lines: 100 },
"use-cases": { statements: 100, branches: 95, functions: 100, lines: 100 },
"controllers": { statements: 100, branches: 95, functions: 100, lines: 100 },
baseline: { statements: 80, branches: 75, functions: 80, lines: 80 },
},
mutationTargets: ["entities", "use-cases"],
}
```
Three readers consume the manifest:
- **Vitest** — `vitest.config.ts` imports the manifest's `coverage` and emits its `thresholds`. The duplicated block in 5 per-feature vitest configs goes away.
- **`assertFeatureConformance`** — reads `coverage/lcov.info` for the package at boot and asserts each band. Graceful degradation in `USE_DEV_SEED=true` (warns rather than throws when lcov is absent).
- **`pnpm coverage:diff`** — uses `baseline` for uncategorized files; stricter layer bands override per matching path glob.
This eliminates the duplication that caused the `@repo/media` drift and centralizes one decision in one place per feature.
**3. Diff coverage is cover-the-diff, not cover-the-new-code.** Every changed _executable_ line must have execution-count > 0 against the merged lcov. Modified-but-not-new lines count too — catches "agent edited code, didn't update the test." Allowlist: `*.test.ts`, `*.config.*`, `*.md`, `*.json`, `*.mjs`, plus the per-package exclude lists.
**4. Aggregate trend ships in-tree, not via SaaS.** `coverage/summary.json` is committed on merge to main. Trend readable via `git log -- coverage/summary.json`. No external service dependency; the dispatch loop can read history without a network call.
**5. Mutation testing is opt-in and narrowly scoped.** Stryker with `@stryker-mutator/vitest-runner`, runs on `entities/` + `application/use-cases/` only. Default mutation-score threshold 80% per feature (tunable per-manifest). Not part of `pnpm test`. Nightly GH Action surfaces score drift > 5%.
**6. Output format is machine-first.** `pnpm coverage:diff` emits JSON to stdout; human-readable summary to stderr. The dispatch loop reads stdout; humans read stderr or the HTML report.
## Alternatives considered
- **Codecov / Coveralls SaaS.** Polished PR comments and trend dashboards, free for OSS. Rejected as the _primary_ L2 store — adds an external dep, makes the dispatch loop dependent on a network call, and the PR-comment UX targets humans (not the primary consumer of this signal). Can be added later as gold-plating without disturbing the architecture.
- **Cover-the-new-code instead of cover-the-diff.** Lighter touch; ignores modified lines. Rejected — catches less drift. A slice that edits a use case without updating its test should fail, and cover-the-new-code wouldn't notice.
- **Keep thresholds in per-package vitest configs.** Status quo. Rejected — the 2026-05-13 audit found drift in 2 of 5 features (media had no block at all; navigation's block diverged subtly from the canonical). Manifest centralization is the only durable fix.
- **Run mutation testing in default `pnpm test`.** Rejected — Stryker on entities + use-cases takes minutes. Adding minutes to the default loop violates the constraint ("new gates must not add more than ~30s wall time"). On-demand is the right cadence; nightly catches drift.
- **Mutation testing across all layers.** Rejected for v1 — repository/controller/integration code has too many environmental dependencies to mutate cleanly. Start narrow; expand if signal is high.
- **Use ESLint or fallow for diff coverage.** Rejected — diff coverage needs runtime data (which lines actually executed), not AST or filesystem state. It belongs alongside `pnpm test`, not in `pnpm lint` or `pnpm fallow`.
- **Boot-time coverage assertion is too heavy.** Considered. Counter-argument: the assertion is `O(features × lcov-file-size)` — small numbers, ~200ms. The graceful-degradation in dev mode means contributors aren't blocked. The payoff — coverage drift caught at the same latency as TypeScript brands — justifies the machinery.
## Consequences
**Positive:**
- Every PR/task is gated on covering its own diff. Agent shipping an untested slice becomes mechanically impossible at the CI step.
- One source of truth per feature for coverage expectations. The `@repo/media`-style "no coverage block" drift can't recur.
- Trend history lives in the repo. `git log -- coverage/summary.json` answers "how has coverage moved over the last quarter?" without leaving the editor.
- Mutation testing on the highest-leverage layers (entities + use-cases — the pure-business-logic surface) raises the floor on test quality without slowing the dispatch loop.
- Machine-readable diff-coverage output integrates directly with the dispatch loop's post-task verification, completing the agent-first observability story.
- Coverage joins the 5-gate conformance philosophy as a first-class signal; ADR-020 becomes the row alongside TS brands / ESLint / boot / `pnpm conformance` / fallow / coverage.
**Negative:**
- Implementation surface is non-trivial: 68 stories spanning manifest schema, vitest auto-derive, two new scripts, boot-time assertion, mutation tooling, ADR + guide + glossary + generator + hook updates.
- The boot-time assertion adds a small dependency on `coverage/lcov.info` existing. Graceful degradation in dev mode handles this, but the implementation needs care.
- `coverage/summary.json` committed on merge introduces a small CI permissions surface (`contents: write`) gated to the main-branch workflow.
- Mutation testing is slow. The nightly cadence is the compromise; on-demand `pnpm mutate` is opt-in but rare in practice.
## Implementation phasing
Shipped as a single epic over 10 commits on 2026-05-13. Per-step state:
| # | Step | Commit | Status |
| --- | ----------------------------------------------------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | Manifest schema + helper + auth proof-of-concept | `f7baa8b` | ✅ Shipped |
| 2 | Vitest auto-derive (helper + `DEFAULT_COVERAGE_BANDS`) | `f7baa8b` + rollouts | ✅ Shipped (all 5 features wired) |
| 3 | L1 diff coverage (`scripts/coverage/diff.mjs`) | `412d994` | ✅ Shipped |
| 4 | L2 aggregate (`scripts/coverage/aggregate.mjs` + `summary.json`) | `bd5a077` | ✅ Shipped |
| 5 | CI integration (validate gate + snapshot workflow) | `39e33eb` | ✅ Shipped |
| 6 | Helper rollout to blog + marketing-pages | `15db9c4` | ✅ Shipped |
| 7 | Docs + generator + hook rollout | `4dce1df` + `f4254aa` | ✅ Shipped |
| 8 | L3 mutation testing (Stryker + nightly Action) | `6428f10` | ✅ Shipped (auth proof-of-concept; other features can add `stryker.config.json` by `extends: "@repo/core-testing/stryker.base.json"`) |
| 9 | L0 unification (close test gaps in nav + media + marketing-pages) | `bf0b049` | ✅ Shipped — all 5 features hit declared bands |
| 10 | Boot-time `assertFeatureConformance` coverage check | — | ⏸ Deferred. Duplicates L0's structural enforcement when both readers derive from the same manifest source of truth; the drift it was supposed to catch is mechanically impossible. Revisit if a concrete need emerges. |
Repo-wide state at shipping (`coverage/summary.json`): statements 95.87% / branches 88.91% / functions 100% / lines 95.87%. All five features pass their declared 100%/100%/95%/100% bands on entities/use-cases/controllers.
## Related
- ADR-006 — vertical-feature-packages
- ADR-011 — TDD foundation
- ADR-018 — audit-and-compliance (similar manifest-declared shape pattern)
- ADR-019 — sandcastle agent orchestration (the dispatch loop that reads `pnpm coverage:diff`)
- PRD `coverage-architecture` — implementation seed