Files
agentic-dev-template/docs/decisions/adr-020-coverage-architecture.md
Danijel Martinek bae4b66fa4 refactor(work): drop date prefixes + move _state.json into _system/
Convention shift: epic folders + PRD filenames + frontmatter id
fields are now bare slugs. The created: timestamp (Phase 2) carries
the date; folder names don't repeat it. A future <task-id>-<slug>
shape (e.g. ClickUp) lands cleanly when that integration ships.

Renames (git mv preserves history):
- docs/work/2026-05-13-binder-wrap-helper/
    -> docs/work/binder-wrap-helper/
- docs/work/2026-05-14-library-evaluation-policy/
    -> docs/work/library-evaluation-policy/
- docs/work/2026-05-14-ci-security-and-supply-chain/
    -> docs/work/ci-security-and-supply-chain/
- docs/work/prds/2026-05-13-binder-wrap-helper.prd.md
    -> docs/work/prds/binder-wrap-helper.prd.md
- docs/work/prds/2026-05-13-coverage-architecture.prd.md
    -> docs/work/prds/coverage-architecture.prd.md
- docs/work/prds/2026-05-14-library-evaluation-policy.prd.md
    -> docs/work/prds/library-evaluation-policy.prd.md
- docs/work/prds/2026-05-14-ci-security-and-supply-chain.prd.md
    -> docs/work/prds/ci-security-and-supply-chain.prd.md

Frontmatter updates inside the renamed files: epic id, epic prd,
story epic, PRD id, PRD builds-on all drop date prefixes.

System folder + state file move:
- New docs/work/_system/ holds framework-managed state.
- docs/work/_state.json -> docs/work/_system/_state.json.
- state-builder.mjs adds _system to SKIP_FOLDERS.
- cli.mjs + state-sync-guard.mjs + .husky/pre-commit point at the
  new path.

template-reset-v1 epic deleted entirely (one-off cleanup epic from
the pre-date-convention era; status was already done).

Generator-template updates (so new artifacts ship in the right
shape):
- .sandcastle/decomposer.prompt.md emits bare-slug folder names +
  ISO created: timestamp.
- .claude/skills/to-prd/SKILL.md template uses bare-slug filename +
  bare-slug id field + ISO created: timestamp.

Doc reference updates: glossary, runbook, agent-first-workflow-
and-conformance, reviewer prompt, ADR-020, ADR-022, ADR-023 all
point at the new paths/slugs.
2026-05-14 21:16:51 +02:00

13 KiB
Raw Blame History

ADR-020 — Agent-first coverage architecture (4 layers + manifest-driven thresholds)

Status: Accepted Date: 2026-05-13 Builds on: ADR-006 (vertical-feature-packages), ADR-011 (TDD foundation) PRD: docs/work/prds/coverage-architecture.prd.md

Context

ADR-011 established the TDD foundation: per-package vitest configs with V8 coverage, a coverage baseline (80/75/80/80) and stricter per-layer bands (100% on entities/, application/use-cases/, interface-adapters/controllers/). The ESLint conformance rule usecase-must-have-test-file enforces that a test file exists. CI runs pnpm test -- --coverage and uploads **/coverage/lcov.info as an artifact.

This leaves five gaps that matter especially in an agent-first repo:

  1. No diff-coverage gate. A slice can ship without exercising its new lines. The ESLint rule only checks file presence; the per-package thresholds catch only large drops.
  2. No aggregate visibility. N separate lcov files; no merged view, no trend over time.
  3. Threshold declarations are duplicated across 5+ vitest.config.ts files. Drift is mechanical to spot (we did it during the 2026-05-13 brainstorm: @repo/media had no coverage: block at all; @repo/navigation failed its declared layer thresholds in entities + controllers).
  4. 100% coverage with weak assertions is invisible. Coverage doesn't measure whether tests would catch real regressions. Mutation testing — explicitly deferred in ADR-011 — is the next signal.
  5. Coverage data isn't agent-readable. The HTML report serves humans; the dispatch loop has no way to ask "did my slice cover its diff?".

Decision

1. Adopt a 4-layer coverage architecture mirroring the 5-gate conformance philosophy (multi-latency, machine-readable, agent-first):

Layer Catches Latency Surface
L0 Per-layer vitest thresholds Drift below declared bands ~530s per package pnpm test --coverage (existing)
L1 Diff coverage Changed line not exercised ~5s after L0 pnpm coverage:diff; CI gate; dispatch post-task
L2 Aggregate trend Drift across the codebase over time ~10s pnpm coverage:aggregate; committed coverage/summary.json
L3 Mutation testing Tests that exist + execute the code + assert nothing Minutes pnpm mutate; on-demand, not default pnpm test

Each layer answers a distinct question; none replaces the others.

2. Make feature.manifest.ts the single source of truth for coverage expectations. A new coverage: section per feature:

coverage: {
  bands: {
    "entities":    { statements: 100, branches: 100, functions: 100, lines: 100 },
    "use-cases":   { statements: 100, branches:  95, functions: 100, lines: 100 },
    "controllers": { statements: 100, branches:  95, functions: 100, lines: 100 },
    baseline:      { statements:  80, branches:  75, functions:  80, lines:  80 },
  },
  mutationTargets: ["entities", "use-cases"],
}

Three readers consume the manifest:

  • Vitestvitest.config.ts imports the manifest's coverage and emits its thresholds. The duplicated block in 5 per-feature vitest configs goes away.
  • assertFeatureConformance — reads coverage/lcov.info for the package at boot and asserts each band. Graceful degradation in USE_DEV_SEED=true (warns rather than throws when lcov is absent).
  • pnpm coverage:diff — uses baseline for uncategorized files; stricter layer bands override per matching path glob.

This eliminates the duplication that caused the @repo/media drift and centralizes one decision in one place per feature.

3. Diff coverage is cover-the-diff, not cover-the-new-code. Every changed executable line must have execution-count > 0 against the merged lcov. Modified-but-not-new lines count too — catches "agent edited code, didn't update the test." Allowlist: *.test.ts, *.config.*, *.md, *.json, *.mjs, plus the per-package exclude lists.

4. Aggregate trend ships in-tree, not via SaaS. coverage/summary.json is committed on merge to main. Trend readable via git log -- coverage/summary.json. No external service dependency; the dispatch loop can read history without a network call.

5. Mutation testing is opt-in and narrowly scoped. Stryker with @stryker-mutator/vitest-runner, runs on entities/ + application/use-cases/ only. Default mutation-score threshold 80% per feature (tunable per-manifest). Not part of pnpm test. Nightly GH Action surfaces score drift > 5%.

6. Output format is machine-first. pnpm coverage:diff emits JSON to stdout; human-readable summary to stderr. The dispatch loop reads stdout; humans read stderr or the HTML report.

Alternatives considered

  • Codecov / Coveralls SaaS. Polished PR comments and trend dashboards, free for OSS. Rejected as the primary L2 store — adds an external dep, makes the dispatch loop dependent on a network call, and the PR-comment UX targets humans (not the primary consumer of this signal). Can be added later as gold-plating without disturbing the architecture.
  • Cover-the-new-code instead of cover-the-diff. Lighter touch; ignores modified lines. Rejected — catches less drift. A slice that edits a use case without updating its test should fail, and cover-the-new-code wouldn't notice.
  • Keep thresholds in per-package vitest configs. Status quo. Rejected — the 2026-05-13 audit found drift in 2 of 5 features (media had no block at all; navigation's block diverged subtly from the canonical). Manifest centralization is the only durable fix.
  • Run mutation testing in default pnpm test. Rejected — Stryker on entities + use-cases takes minutes. Adding minutes to the default loop violates the constraint ("new gates must not add more than ~30s wall time"). On-demand is the right cadence; nightly catches drift.
  • Mutation testing across all layers. Rejected for v1 — repository/controller/integration code has too many environmental dependencies to mutate cleanly. Start narrow; expand if signal is high.
  • Use ESLint or fallow for diff coverage. Rejected — diff coverage needs runtime data (which lines actually executed), not AST or filesystem state. It belongs alongside pnpm test, not in pnpm lint or pnpm fallow.
  • Boot-time coverage assertion is too heavy. Considered. Counter-argument: the assertion is O(features × lcov-file-size) — small numbers, ~200ms. The graceful-degradation in dev mode means contributors aren't blocked. The payoff — coverage drift caught at the same latency as TypeScript brands — justifies the machinery.

Consequences

Positive:

  • Every PR/task is gated on covering its own diff. Agent shipping an untested slice becomes mechanically impossible at the CI step.
  • One source of truth per feature for coverage expectations. The @repo/media-style "no coverage block" drift can't recur.
  • Trend history lives in the repo. git log -- coverage/summary.json answers "how has coverage moved over the last quarter?" without leaving the editor.
  • Mutation testing on the highest-leverage layers (entities + use-cases — the pure-business-logic surface) raises the floor on test quality without slowing the dispatch loop.
  • Machine-readable diff-coverage output integrates directly with the dispatch loop's post-task verification, completing the agent-first observability story.
  • Coverage joins the 5-gate conformance philosophy as a first-class signal; ADR-020 becomes the row alongside TS brands / ESLint / boot / pnpm conformance / fallow / coverage.

Negative:

  • Implementation surface is non-trivial: 68 stories spanning manifest schema, vitest auto-derive, two new scripts, boot-time assertion, mutation tooling, ADR + guide + glossary + generator + hook updates.
  • The boot-time assertion adds a small dependency on coverage/lcov.info existing. Graceful degradation in dev mode handles this, but the implementation needs care.
  • coverage/summary.json committed on merge introduces a small CI permissions surface (contents: write) gated to the main-branch workflow.
  • Mutation testing is slow. The nightly cadence is the compromise; on-demand pnpm mutate is opt-in but rare in practice.

Implementation phasing

Shipped as a single epic over 10 commits on 2026-05-13. Per-step state:

# Step Commit Status
1 Manifest schema + helper + auth proof-of-concept f7baa8b Shipped
2 Vitest auto-derive (helper + DEFAULT_COVERAGE_BANDS) f7baa8b + rollouts Shipped (all 5 features wired)
3 L1 diff coverage (scripts/coverage/diff.mjs) 412d994 Shipped
4 L2 aggregate (scripts/coverage/aggregate.mjs + summary.json) bd5a077 Shipped
5 CI integration (validate gate + snapshot workflow) 39e33eb Shipped
6 Helper rollout to blog + marketing-pages 15db9c4 Shipped
7 Docs + generator + hook rollout 4dce1df + f4254aa Shipped
8 L3 mutation testing (Stryker + nightly Action) 6428f10 Shipped (auth proof-of-concept; other features can add stryker.config.json by extends: "@repo/core-testing/stryker.base.json")
9 L0 unification (close test gaps in nav + media + marketing-pages) bf0b049 Shipped — all 5 features hit declared bands
10 Boot-time assertFeatureConformance coverage check ⏸ Deferred. Duplicates L0's structural enforcement when both readers derive from the same manifest source of truth; the drift it was supposed to catch is mechanically impossible. Revisit if a concrete need emerges.

Repo-wide state at shipping (coverage/summary.json): statements 95.87% / branches 88.91% / functions 100% / lines 95.87%. All five features pass their declared 100%/100%/95%/100% bands on entities/use-cases/controllers.

  • ADR-006 — vertical-feature-packages
  • ADR-011 — TDD foundation
  • ADR-018 — audit-and-compliance (similar manifest-declared shape pattern)
  • ADR-019 — sandcastle agent orchestration (the dispatch loop that reads pnpm coverage:diff)
  • PRD coverage-architecture — implementation seed