From 4dce1df0844f8a4a9d7f5eadd3243ed8f759d5dd Mon Sep 17 00:00:00 2001 From: Danijel Martinek Date: Wed, 13 May 2026 13:42:26 +0200 Subject: [PATCH] docs(coverage): ADR-020 + glossary entries + hook keyword group MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Architecture record for the agent-first coverage initiative seeded by the 2026-05-13 PRD. Captures the durable decisions: - 4-layer architecture (L0 vitest, L1 diff, L2 aggregate, L3 mutation) - Manifest-driven coverage band as single source of truth (vitest + assertFeatureConformance + pnpm coverage:diff all read from it) - Cover-the-diff (changed lines), not cover-the-new-code - Committed coverage/summary.json (no SaaS), trend via git log - Mutation testing scoped to entities + use-cases, on-demand only - Machine-first output format (JSON stdout, human stderr) Glossary gets a new "Coverage" section with 7 entries (coverage band, L0-L3 layers, diff coverage, mutation testing, mutation score, coverage/summary.json), plus two relationship rows and a flagged ambiguity for "coverage" qualifiers. prompt-context.sh hook gets a 9th keyword group — when a prompt mentions coverage / uncovered / lcov / mutation / stryker, the relevant ADR + guide path are injected as additional context for the turn. This is the documentation layer of the coverage epic. Implementation (manifest schema, vitest auto-derive, scripts, boot assertion, mutation tooling) lands in subsequent stories. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/hooks/prompt-context.sh | 3 + .../adr-020-coverage-architecture.md | 110 ++++++++++++++++++ docs/glossary.md | 24 ++++ 3 files changed, 137 insertions(+) create mode 100644 docs/decisions/adr-020-coverage-architecture.md diff --git a/.claude/hooks/prompt-context.sh b/.claude/hooks/prompt-context.sh index 4101e91..6d504a7 100755 --- a/.claude/hooks/prompt-context.sh +++ b/.claude/hooks/prompt-context.sh @@ -34,6 +34,9 @@ fi if echo "$prompt" | grep -qE 'boundary|boundaries|cross-package|cross feature'; then inject+=('Boundaries: ADR-006 + ADR-010. Five tags (app|core|core-composition|feature|tooling). Features may only depend on core + tooling. Enforced by ESLint + Turborepo boundaries.') fi +if echo "$prompt" | grep -qE 'coverage|uncovered|lcov|mutation|stryker|coverage band'; then + inject+=('Coverage: ADR-020 + docs/guides/coverage.md (cookbook). 4 layers — L0 vitest thresholds, L1 pnpm coverage:diff (cover-the-diff), L2 coverage/summary.json (committed trend), L3 pnpm mutate (Stryker on entities + use-cases). Manifest-driven: feature.manifest.ts coverage.bands is the single source of truth.') +fi if [ ${#inject[@]} -gt 0 ]; then echo "=== context-relevant pointers (from .claude/hooks/prompt-context.sh) ===" diff --git a/docs/decisions/adr-020-coverage-architecture.md b/docs/decisions/adr-020-coverage-architecture.md new file mode 100644 index 0000000..052f62b --- /dev/null +++ b/docs/decisions/adr-020-coverage-architecture.md @@ -0,0 +1,110 @@ +# ADR-020 — Agent-first coverage architecture (4 layers + manifest-driven thresholds) + +**Status:** Accepted +**Date:** 2026-05-13 +**Builds on:** ADR-006 (vertical-feature-packages), ADR-011 (TDD foundation) +**PRD:** docs/work/prds/2026-05-13-coverage-architecture.prd.md + +## Context + +ADR-011 established the TDD foundation: per-package vitest configs with V8 coverage, a coverage baseline (80/75/80/80) and stricter per-layer bands (100% on `entities/`, `application/use-cases/`, `interface-adapters/controllers/`). The ESLint conformance rule `usecase-must-have-test-file` enforces _that a test file exists_. CI runs `pnpm test -- --coverage` and uploads `**/coverage/lcov.info` as an artifact. + +This leaves five gaps that matter especially in an agent-first repo: + +1. **No diff-coverage gate.** A slice can ship without exercising its new lines. The ESLint rule only checks file presence; the per-package thresholds catch only large drops. +2. **No aggregate visibility.** N separate lcov files; no merged view, no trend over time. +3. **Threshold declarations are duplicated** across 5+ `vitest.config.ts` files. Drift is mechanical to spot (we did it during the 2026-05-13 brainstorm: `@repo/media` had no `coverage:` block at all; `@repo/navigation` failed its declared layer thresholds in entities + controllers). +4. **100% coverage with weak assertions is invisible.** Coverage doesn't measure whether tests would catch real regressions. Mutation testing — explicitly deferred in ADR-011 — is the next signal. +5. **Coverage data isn't agent-readable.** The HTML report serves humans; the dispatch loop has no way to ask "did my slice cover its diff?". + +## Decision + +**1. Adopt a 4-layer coverage architecture** mirroring the 5-gate conformance philosophy (multi-latency, machine-readable, agent-first): + +| Layer | Catches | Latency | Surface | +| ---------------------------------- | ---------------------------------------------------- | ------------------ | ------------------------------------------------------------ | +| **L0** Per-layer vitest thresholds | Drift below declared bands | ~5–30s per package | `pnpm test --coverage` (existing) | +| **L1** Diff coverage | Changed line not exercised | ~5s after L0 | `pnpm coverage:diff`; CI gate; dispatch post-task | +| **L2** Aggregate trend | Drift across the codebase over time | ~10s | `pnpm coverage:aggregate`; committed `coverage/summary.json` | +| **L3** Mutation testing | Tests that exist + execute the code + assert nothing | Minutes | `pnpm mutate`; on-demand, not default `pnpm test` | + +Each layer answers a distinct question; none replaces the others. + +**2. Make `feature.manifest.ts` the single source of truth for coverage expectations.** A new `coverage:` section per feature: + +```ts +coverage: { + bands: { + "entities": { statements: 100, branches: 100, functions: 100, lines: 100 }, + "use-cases": { statements: 100, branches: 95, functions: 100, lines: 100 }, + "controllers": { statements: 100, branches: 95, functions: 100, lines: 100 }, + baseline: { statements: 80, branches: 75, functions: 80, lines: 80 }, + }, + mutationTargets: ["entities", "use-cases"], +} +``` + +Three readers consume the manifest: + +- **Vitest** — `vitest.config.ts` imports the manifest's `coverage` and emits its `thresholds`. The duplicated block in 5 per-feature vitest configs goes away. +- **`assertFeatureConformance`** — reads `coverage/lcov.info` for the package at boot and asserts each band. Graceful degradation in `USE_DEV_SEED=true` (warns rather than throws when lcov is absent). +- **`pnpm coverage:diff`** — uses `baseline` for uncategorized files; stricter layer bands override per matching path glob. + +This eliminates the duplication that caused the `@repo/media` drift and centralizes one decision in one place per feature. + +**3. Diff coverage is cover-the-diff, not cover-the-new-code.** Every changed _executable_ line must have execution-count > 0 against the merged lcov. Modified-but-not-new lines count too — catches "agent edited code, didn't update the test." Allowlist: `*.test.ts`, `*.config.*`, `*.md`, `*.json`, `*.mjs`, plus the per-package exclude lists. + +**4. Aggregate trend ships in-tree, not via SaaS.** `coverage/summary.json` is committed on merge to main. Trend readable via `git log -- coverage/summary.json`. No external service dependency; the dispatch loop can read history without a network call. + +**5. Mutation testing is opt-in and narrowly scoped.** Stryker with `@stryker-mutator/vitest-runner`, runs on `entities/` + `application/use-cases/` only. Default mutation-score threshold 80% per feature (tunable per-manifest). Not part of `pnpm test`. Nightly GH Action surfaces score drift > 5%. + +**6. Output format is machine-first.** `pnpm coverage:diff` emits JSON to stdout; human-readable summary to stderr. The dispatch loop reads stdout; humans read stderr or the HTML report. + +## Alternatives considered + +- **Codecov / Coveralls SaaS.** Polished PR comments and trend dashboards, free for OSS. Rejected as the _primary_ L2 store — adds an external dep, makes the dispatch loop dependent on a network call, and the PR-comment UX targets humans (not the primary consumer of this signal). Can be added later as gold-plating without disturbing the architecture. +- **Cover-the-new-code instead of cover-the-diff.** Lighter touch; ignores modified lines. Rejected — catches less drift. A slice that edits a use case without updating its test should fail, and cover-the-new-code wouldn't notice. +- **Keep thresholds in per-package vitest configs.** Status quo. Rejected — the 2026-05-13 audit found drift in 2 of 5 features (media had no block at all; navigation's block diverged subtly from the canonical). Manifest centralization is the only durable fix. +- **Run mutation testing in default `pnpm test`.** Rejected — Stryker on entities + use-cases takes minutes. Adding minutes to the default loop violates the constraint ("new gates must not add more than ~30s wall time"). On-demand is the right cadence; nightly catches drift. +- **Mutation testing across all layers.** Rejected for v1 — repository/controller/integration code has too many environmental dependencies to mutate cleanly. Start narrow; expand if signal is high. +- **Use ESLint or fallow for diff coverage.** Rejected — diff coverage needs runtime data (which lines actually executed), not AST or filesystem state. It belongs alongside `pnpm test`, not in `pnpm lint` or `pnpm fallow`. +- **Boot-time coverage assertion is too heavy.** Considered. Counter-argument: the assertion is `O(features × lcov-file-size)` — small numbers, ~200ms. The graceful-degradation in dev mode means contributors aren't blocked. The payoff — coverage drift caught at the same latency as TypeScript brands — justifies the machinery. + +## Consequences + +**Positive:** + +- Every PR/task is gated on covering its own diff. Agent shipping an untested slice becomes mechanically impossible at the CI step. +- One source of truth per feature for coverage expectations. The `@repo/media`-style "no coverage block" drift can't recur. +- Trend history lives in the repo. `git log -- coverage/summary.json` answers "how has coverage moved over the last quarter?" without leaving the editor. +- Mutation testing on the highest-leverage layers (entities + use-cases — the pure-business-logic surface) raises the floor on test quality without slowing the dispatch loop. +- Machine-readable diff-coverage output integrates directly with the dispatch loop's post-task verification, completing the agent-first observability story. +- Coverage joins the 5-gate conformance philosophy as a first-class signal; ADR-020 becomes the row alongside TS brands / ESLint / boot / `pnpm conformance` / fallow / coverage. + +**Negative:** + +- Implementation surface is non-trivial: 6–8 stories spanning manifest schema, vitest auto-derive, two new scripts, boot-time assertion, mutation tooling, ADR + guide + glossary + generator + hook updates. +- The boot-time assertion adds a small dependency on `coverage/lcov.info` existing. Graceful degradation in dev mode handles this, but the implementation needs care. +- `coverage/summary.json` committed on merge introduces a small CI permissions surface (`contents: write`) gated to the main-branch workflow. +- Mutation testing is slow. The nightly cadence is the compromise; on-demand `pnpm mutate` is opt-in but rare in practice. + +## Implementation phasing + +In order (each landable independently): + +1. **L0 unification** — fix `@repo/media` (missing dep + missing config block) and `@repo/navigation` (real test gaps) so every feature passes its declared bands today. +2. **Manifest schema** — extend `feature.manifest.ts` shape (Zod schema in `core-shared/conformance/`) with the `coverage:` section. +3. **Vitest auto-derive** — `vitest.config.ts` per feature imports the manifest and emits `coverage.thresholds`. Eliminates duplication. +4. **L1 diff coverage** — `scripts/coverage/diff.mjs` + `pnpm coverage:diff` script + CI gate. +5. **L2 aggregate** — `scripts/coverage/aggregate.mjs` + `pnpm coverage:aggregate` + summary.json + merge-to-main workflow. +6. **L3 mutation testing** — Stryker setup + `pnpm mutate` + nightly GH Action. +7. **Boot-time `assertFeatureConformance`** — coverage band read against lcov. +8. **Docs + generator + hook rollout** — this ADR (now landing), `docs/guides/coverage.md`, glossary entries, `pnpm turbo gen feature` template update, `.claude/hooks/prompt-context.sh` keyword group. + +## Related + +- ADR-006 — vertical-feature-packages +- ADR-011 — TDD foundation +- ADR-018 — audit-and-compliance (similar manifest-declared shape pattern) +- ADR-019 — sandcastle agent orchestration (the dispatch loop that reads `pnpm coverage:diff`) +- PRD `2026-05-13-coverage-architecture` — implementation seed diff --git a/docs/glossary.md b/docs/glossary.md index 784dcaf..d06896a 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -124,6 +124,27 @@ The whole-codebase auditor (`pnpm fallow`) — dead exports, unused files, dupli **Drift**: Any disagreement between a feature's manifest and its code. The conformance gates are designed to catch drift at the earliest possible latency. +## Coverage + +**Coverage band**: +The per-path threshold declared in `feature.manifest.ts` under `coverage.bands` — one entry per Clean Architecture layer (`entities`, `use-cases`, `controllers`) plus a `baseline` for everything else. Single source of truth read by vitest, `assertFeatureConformance`, and `pnpm coverage:diff`. See ADR-020. + +**L0 / L1 / L2 / L3**: +The four coverage layers (ADR-020). **L0** = vitest per-layer thresholds (test-time). **L1** = `pnpm coverage:diff` cover-the-diff gate (post-test, CI + dispatch loop). **L2** = `pnpm coverage:aggregate` → committed `coverage/summary.json` (observability). **L3** = `pnpm mutate` Stryker mutation testing on entities + use-cases (on-demand, not default). + +**Diff coverage**: +The gate that asserts every changed _executable_ line has execution count > 0 in the merged lcov. Cover-the-diff (modified + new lines both count), not cover-the-new-code. Run via `pnpm coverage:diff []`. +_Avoid:_ patch coverage, delta coverage (use "diff coverage" canonically). + +**Mutation testing**: +A test-quality signal that mutates source code and re-runs tests; surviving mutants mean the test exists + executes the code but doesn't actually assert behavior. Scoped to `entities/` + `application/use-cases/` per feature. Run via `pnpm mutate [--filter @repo/]`. + +**Mutation score**: +The percentage of mutants killed (i.e., caught by tests) out of all mutants generated. Per-feature threshold defaults to 80% (overridable in `feature.manifest.ts` via `coverage.mutationThreshold`). + +**`coverage/summary.json`**: +The aggregated per-package + repo-level coverage snapshot, committed on merge to main. Grep-able from git history via `git log -- coverage/summary.json`. Includes timestamp + commit SHA for correlation with deploys. + ## Cross-feature mechanisms **Event bus** (`IEventBus`): @@ -232,6 +253,8 @@ The shipping rhythm. One vertical slice closes one task, becomes one PR, lands a - A **Channel descriptor** is public; its **Realtime handler** is always private. - **Brands** are attached only at **DI bind time**, by **`withSpan` / `withCapture` / `withAudit`**. - **Conformance** asserts the **Manifest** and code agree, at five latency tiers. +- A **Coverage band** in the **Manifest** is read by Vitest (L0), `assertFeatureConformance` (boot), and `pnpm coverage:diff` (L1) — one decision, three enforcers. +- **Mutation testing** is the third dimension of test quality, after "test file exists" (ESLint structural) and "test executes the code" (L0/L1 coverage). ## Flagged ambiguities @@ -244,6 +267,7 @@ The shipping rhythm. One vertical slice closes one task, becomes one PR, lands a - **"spec"** — `docs/superpowers/specs/` design doc; never a PRD or a Jest spec file. - **"controller"** — one verb-noun pair per file; never a multi-method MVC controller. - **"module"** — avoid for packages (use "package"); reserve for Node ESM/CJS module semantics if needed. +- **"coverage"** — qualify when ambiguous: **L0 coverage** (per-layer thresholds, test-time) | **diff coverage / L1** (cover-the-diff gate) | **aggregate coverage / L2** (committed summary.json) | **mutation coverage / L3** (Stryker mutation score, not line coverage). ## Cross-references