--- id: 2026-05-13-coverage-architecture title: Agent-first coverage architecture (4 layers + manifest-driven thresholds) type: prd status: shipped author: danijel elicitation-session: brainstorm-2026-05-13 created: 2026-05-13 shipped: 2026-05-13 shipping-commits: - 7eb783a (PRD) - 4dce1df (ADR-020 + glossary + hook) - f7baa8b (manifest schema + helper + auth) - 412d994 (L1 coverage:diff) - bd5a077 (L2 coverage:aggregate) - 39e33eb (CI integration) - 15db9c4 (helper rollout blog + marketing-pages) - f4254aa (cookbook guide + generator) - 6428f10 (L3 Stryker mutation) - bf0b049 (L0 unification — all 5 features green) --- ## Problem The template enforces "every use case has a test file" via the `usecase-must-have-test-file` ESLint rule (structural) and declares per-layer coverage thresholds in each feature's `vitest.config.ts` (100% on entities/use-cases/controllers; 80/75/80/80 baseline). But: - **Agents can ship slices that don't actually exercise the new code.** The ESLint rule only checks file presence; coverage % checks what executed. There's no gate that says "this PR's diff was tested." - **The declared 100%-on-critical-layers thresholds may be aspirational, not enforced.** Their actual green/red state is unverified. CI uploads `**/coverage/lcov.info` as an artifact but doesn't gate on it. - **There's no aggregate visibility.** No merged report, no trend, no "is the codebase covered well right now?" answer. - **100% coverage with weak assertions is invisible.** Tests that import the SUT but barely assert anything pass coverage. The third dimension of test quality — "would my test catch a real regression?" — isn't measured. - **Coverage expectations live in 5 separate vitest configs.** Drift across features is easy and hard to spot. For an agent-first template where most code is authored by AI agents in vertical slices, this is the single biggest gap between "feature shipped" and "feature shipped safely." ## Goal Establish a 4-layer coverage architecture that mirrors the existing 5-gate conformance philosophy (multi-latency, machine-readable, agent-first) and makes coverage a first-class conformance signal driven from each feature's `feature.manifest.ts`. ## In scope - A `coverage:` section in `feature.manifest.ts` as the single source of truth for per-layer expectations - Vitest config auto-derives test-time thresholds from the manifest - `pnpm coverage:diff` script — cover-the-diff gate (changed lines must be exercised), machine-readable output for the dispatch loop - `pnpm coverage:aggregate` script — merges per-package lcov into a root `coverage/lcov.info` + grep-able `coverage/summary.json` - `coverage/summary.json` committed per merge; trend readable from `git log` - `pnpm mutate` — Stryker on `entities/` + `application/use-cases/` only, on-demand (not part of `pnpm test`) - `assertFeatureConformance` reads the manifest's coverage band and the package's lcov at boot; fails if drift - CI gate: `pnpm test --coverage` (existing) + `pnpm coverage:diff` (new) + `pnpm coverage:aggregate` (new) - ADR-020 capturing the architecture - `docs/guides/coverage.md` cookbook - Glossary entries for new vocabulary - Generator update — `pnpm turbo gen feature` scaffolds the `coverage:` manifest section - `.claude/hooks/prompt-context.sh` detects coverage-related prompts and injects pointers ## Out of scope - **Codecov / SaaS dashboards.** Aggregate trend ships as committed `coverage/summary.json` only. SaaS can be a later ADR. - **Coverage badges or PR comments.** Optional follow-up if humans want them; agents don't. - **Mutation testing on infrastructure / repositories / controllers** — entities + use-cases only for v1. Wider scope is a future epic. - **Branch coverage on `__seeds__/` / `__factories__/` / `__contracts__/`** — already excluded from coverage; stays out. - **Coverage for `apps/`** — they have their own existing thresholds; not part of this initiative. - **Test quality beyond coverage + mutation** — property-based tests, fuzzing, etc. are separate concerns. ## Constraints - ADR-014, ADR-017 (instrumentation): coverage instrumentation must not interfere with span/log collection. - ADR-011 (TDD foundation): the declared per-layer thresholds (entities/use-cases/controllers at 100%; baseline 80/75/80/80) are existing decisions; we honor them and centralize their declaration. - The conformance ESLint rules are AST-time + filesystem; coverage assertions are runtime data — they must remain in separate gates. - Generator-first is non-negotiable: any new file added to a feature must come from `pnpm turbo gen feature` or a sibling generator. - `pnpm conformance` and `pnpm test` already take ~120s and ~90s respectively; new gates must not add more than ~30s wall time to the default loop. - Agent dispatch (`pnpm work dispatch`) must be able to read coverage results as JSON without a network call. ## Success criteria - `pnpm test --coverage` passes green across all five feature packages (verifies L0 baseline is real, not aspirational). - `pnpm coverage:diff` exits non-zero when a changed line is uncovered; outputs JSON to stdout listing each uncovered hunk with file + line range. - `pnpm coverage:aggregate` produces `coverage/lcov.info` + `coverage/summary.json` at the repo root; both readable in <1s. - `coverage/summary.json` is committed and `git log -- coverage/summary.json` shows trend over time. - `pnpm mutate --filter ` runs Stryker on entities + use-cases; produces a per-feature mutation score. - `feature.manifest.ts` includes `coverage: { ... }`; removing or weakening a band fails `pnpm dev` boot via `assertFeatureConformance`. - CI fails when (a) a per-layer threshold breach happens, (b) any changed line is uncovered, (c) the aggregate report can't be produced. - ADR-020 + `docs/guides/coverage.md` + glossary entries land alongside implementation. - `pnpm turbo gen feature` emits a manifest with the `coverage:` section pre-populated to defaults. ## User stories 1. As an **AI implementer agent**, I want `pnpm coverage:diff` to exit with a precise JSON list of uncovered hunks, so that I can immediately add the missing test without searching. 2. As an **AI implementer agent**, I want `pnpm dev` to refuse to boot if I've authored a manifest entry without backing tests, so that I cannot ship an untested slice. 3. As an **AI reviewer agent**, I want the dispatch loop to surface coverage drift as part of its post-task verification, so that I can flag the slice for revision. 4. As a **human reviewer**, I want `coverage/summary.json` committed on merge, so that I can see coverage trend via `git log` without leaving the repo. 5. As a **template maintainer**, I want each feature's `feature.manifest.ts` to declare its coverage band, so that I have one place to read or change expectations. 6. As a **template maintainer**, I want `pnpm mutate` to surface tests that don't actually assert, so that 100% line coverage can't paper over weak tests. 7. As a **template adopter**, I want `pnpm turbo gen feature` to scaffold the `coverage:` manifest section with sensible defaults, so that I don't have to remember the shape. 8. As a **future agent in a session**, I want the prompt-context hook to inject coverage pointers when I mention "coverage" or "uncovered" in a prompt, so that the relevant ADR + guide load automatically. 9. As an **on-call engineer**, I want `coverage/summary.json` to include a timestamp + commit SHA, so that I can correlate coverage state with deploys. 10. As an **AI agent receiving a handoff**, I want the coverage state of the in-flight slice to be readable from a known path, so that I can continue without re-running the suite. ## Implementation decisions ### Architecture: 4 layers, mirroring the 5-gate conformance philosophy | Layer | What it catches | Latency | Runs in | | ---------------------------------- | ----------------------------------------------------- | ------------------ | ------------------------------------------------------------ | | **L0** Per-layer vitest thresholds | Drift below declared bands (e.g., entities < 100%) | ~5–30s per package | `pnpm test --coverage` (existing) | | **L1** Diff coverage | "Changed line was not exercised" | ~5s after L0 | `pnpm coverage:diff`; CI gate; dispatch post-task | | **L2** Aggregate trend | "Codebase coverage drifted over time" | ~10s | `pnpm coverage:aggregate`; committed `coverage/summary.json` | | **L3** Mutation testing | "Test exists, executes the code, but asserts nothing" | Minutes | `pnpm mutate`; on-demand; nightly GH Action | ### Manifest-driven coverage band (the keystone) A `coverage:` section in `feature.manifest.ts` is the single source of truth. Default scaffolded shape: - `entities`: 100/100/100/100 - `use-cases`: 100/95/100/100 - `controllers`: 100/95/100/100 - `baseline`: 80/75/80/80 - `mutationTargets`: `["entities", "use-cases"]` Three readers: 1. **Vitest** — `vitest.config.ts` imports the manifest and emits its `coverage.thresholds`. Eliminates the duplication today. 2. **`assertFeatureConformance`** — reads `coverage/lcov.info` for the package at boot, asserts each band. Fails to boot on drift (matching the existing brand-assertion shape). 3. **`pnpm coverage:diff`** — uses `baseline` as the default expectation for non-layer-tagged files; stricter bands override per-path. ### Diff coverage algorithm `pnpm coverage:diff []` (default `origin/main...HEAD`): 1. Run `git diff --unified=0 --no-color ...HEAD` → list of `(file, hunk-start, hunk-end)` for changed/added lines. 2. Read merged `coverage/lcov.info` → executable-line + execution-count per file. 3. For each changed line that is _executable_ (skip comments, types, exports, declarations): assert execution count > 0. 4. Output stdout JSON `{ status: "pass" | "fail", uncovered: [{ file, line, kind }] }`. 5. Output stderr human-readable summary. 6. Exit 0 on pass, 1 on fail. Filename allowlist (don't gate on diff coverage): `*.test.ts`, `*.test.tsx`, `*.config.*`, `*.md`, `*.json`, `*.mjs`, `*.cjs`. Same exclude list as L0's existing per-package coverage excludes (DI bootstrap, interfaces, CMS, UI, factories, contracts). ### Aggregate report `pnpm coverage:aggregate`: 1. Collect every `packages/**/coverage/lcov.info` + `apps/**/coverage/lcov.info`. 2. Merge into `coverage/lcov.info` via `lcov-result-merger` or `monocart-coverage-reports`. 3. Emit `coverage/summary.json`: ```json { "generatedAt": "2026-05-13T12:34:56Z", "commit": "abc1234", "repo": { "statements": 87.4, "branches": 81.2, "functions": 89.0, "lines": 87.4 }, "byPackage": { "@repo/auth": { "statements": 96.1, ... }, ... } } ``` 4. Re-render HTML at `coverage/html/index.html` for human drill-down (gitignored). `coverage/summary.json` is committed; `coverage/lcov.info` + `coverage/html/` are gitignored. ### Mutation testing `pnpm mutate [--filter @repo/]`: - Uses Stryker with `@stryker-mutator/vitest-runner`. - Per-feature `stryker.config.json` scaffolded by the generator. - Mutation scope honors the manifest's `mutationTargets` (entities + use-cases by default). - Output: `reports/mutation//` (HTML + JSON; gitignored). - Mutation score threshold: 80% per feature (tunable in manifest); not enforced in default `pnpm test`. - Optional nightly GH Action: `mutation-nightly.yml` — runs across all features, opens an issue on score drop > 5%. ### Boot-time assertion `assertFeatureConformance` (in `core-shared/conformance/`) gains a new check: 1. Look for `coverage/lcov.info` at the feature package root. 2. If absent: skip with a `logger.debug(...)` (dev mode is allowed to lack coverage data). 3. If present: parse, compute per-layer % for `entities/`, `application/use-cases/`, `interface-adapters/controllers/`. 4. Compare against the manifest's `coverage:` bands. Fail boot on breach with a `CoverageDriftError`. Graceful degradation in dev (`USE_DEV_SEED=true`): assertion logs warning instead of throwing, so `pnpm dev` boots without a fresh coverage run. ### CI integration `.github/workflows/ci.yml` gains three steps after the existing test step: 1. `pnpm coverage:aggregate` (always) 2. `pnpm coverage:diff` (always; fails build on uncovered diff) 3. Upload `coverage/lcov.info` + `coverage/summary.json` as artifact (existing flow) 4. On merge to main only: commit `coverage/summary.json` back to the repo (separate workflow with `permissions: contents: write`) ### Generator integration `pnpm turbo gen feature` template emits: - Manifest with `coverage: { ... defaults ... }` section at a `` anchor - `vitest.config.ts` that imports the manifest and derives `coverage.thresholds` from it - `stryker.config.json` at the package root, scoped to `entities/` + `application/use-cases/` The CI guard at `packages/core-eslint/anchors.test.js` adds the new anchor. ### Hook integration `.claude/hooks/prompt-context.sh` gains a new keyword group: ``` if echo "$prompt" | grep -qE 'coverage|uncovered|mutation|stryker|lcov'; then inject+=('Coverage: ADR-020 + docs/guides/coverage.md. 4 layers: L0 vitest thresholds, L1 pnpm coverage:diff, L2 coverage/summary.json, L3 pnpm mutate. Manifest-driven via feature.manifest.ts coverage section.') fi ``` ## Testing decisions A good test for this initiative covers behavior through the public surface: - The diff-coverage script gets tested by feeding it a synthetic `git diff` + `lcov.info` and asserting JSON output shape and exit code. Fixtures, not e2e. - The aggregate script gets unit-tested by feeding it N synthetic per-package lcov files and asserting the merged output structure. - `assertFeatureConformance` gets a new test in `core-shared/conformance/` that constructs a manifest + a `coverage/lcov.info` fixture and asserts pass/fail behavior. No real test run. - Stryker integration gets an integration-test style verification: run `pnpm mutate --filter @repo/auth` in CI nightly and assert mutation score > 80% on a known-good commit. - The manifest schema gets a Zod schema test: invalid `coverage:` shapes fail parse with a specific error message. **Modules to add test coverage to:** - `scripts/coverage/diff.mjs` (the diff coverage runner) - `scripts/coverage/aggregate.mjs` (the merger) - `packages/core-shared/src/conformance/assert-coverage.ts` (the boot-time reader + comparator) - `packages/core-shared/src/conformance/manifest-schema.ts` (extended Zod schema) **Prior art to mirror:** - `scripts/work/state-builder.mjs` — for the diff-coverage script's shape (Node ESM, no deps, fixture-based test) - `packages/core-eslint/anchors.test.js` — for the new anchor's CI guard pattern - Existing `assertFeatureConformance` brand-assertion code — for the boot-time check shape ## Open questions - **Q1: Mutation score threshold per feature** — start at 80% across the board, or tune per-feature in the manifest? **Recommended:** start at 80% baseline, allow per-feature override via `coverage.mutationThreshold`. - **Q2: When `coverage/lcov.info` is stale at boot** — fail loudly or warn? **Recommended:** warn in dev, fail in `NODE_ENV=production` boot. - **Q3: Should `pnpm coverage:diff` run automatically as a Git pre-push hook?** **Recommended:** no — pre-commit already runs; pre-push adds latency without proportional value. Keep it CI-only + dispatch-loop. - **Q4: How is `coverage/summary.json` committed safely on merge to main?** **Recommended:** separate workflow with a github-actions[bot] identity, `permissions: contents: write`, skipped if no diff in summary. - **Q5: `monocart-coverage-reports` vs. `lcov-result-merger`?** **Recommended:** monocart — actively maintained, single-binary, handles V8 + Istanbul, no Python dep. ## Out of scope (deferred to future PRDs) - **Codecov / SaaS dashboards.** Aggregate is committed; SaaS is gold-plating. - **Mutation testing on `infrastructure/`, `interface-adapters/`, `integrations/`.** Bigger surface, slower runs; revisit after L3 v1 is stable. - **Coverage-aware Storybook screenshot diffing.** Visual regression is its own concern. - **Code review heuristics from coverage** (e.g., "this PR touched a 100%-covered file and dropped to 95% — needs review tag"). Possible follow-up via fallow audit. ## L0 verification findings (2026-05-13, during brainstorm) Ran `pnpm --filter @repo/ test -- --coverage --run` per feature. State of the declared 100%-on-entities/use-cases/controllers: | Feature | State | Detail | | ----------------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `@repo/auth` | ✅ green | 21 tests, 93.7% overall, all per-layer bands hit 100% | | `@repo/blog` | ✅ green | passed (no threshold errors surfaced) | | `@repo/marketing-pages` | ✅ green | passed | | `@repo/navigation` | ❌ real gap | `entities/`: 86.36% lines / 50% functions; `controllers/`: 86.66% lines / 80% branches | | `@repo/media` | ❌ config + real gap | (1) Missing `@vitest/coverage-v8` dev dep — `--coverage` crashed. (2) `vitest.config.ts` had NO coverage block at all (no per-layer thresholds, no excludes). When the standard block was applied, real gaps surfaced in `controllers/` (one controller at 86.66% lines / 75% branches, lines 19-20 uncovered). | **Conclusions reinforcing the PRD:** - The L0 layer is real (auth/blog/marketing-pages prove the 100%/100%/95%/100% bar is achievable). - The duplication problem is real (5 features, 4 different vitest configs, one entirely absent the coverage block — exactly the drift the manifest-driven keystone eliminates). - Real test gaps exist in 2 of 5 features (navigation, media). Fixing them is part of the implementation epic, not a blocker for this design landing. **Patches landed alongside the PRD (non-blocking for CI):** - `packages/media/package.json` — added `@vitest/coverage-v8` dev dep so `pnpm test -- --coverage` no longer crashes. - `packages/media/vitest.config.ts` — left intentionally minimal (no coverage block); commented to point at the L0 unification story. The full per-layer block for media + the missing tests in navigation + media land in the **L0 unification** story of the implementation epic, as the first work item after the manifest schema lands. ## Further notes - Builds on: ADR-011 (TDD foundation), ADR-006 (vertical-feature-packages), the existing 5-gate conformance system. - ADR-020 is the durable record of the architecture decisions; this PRD is the implementation seed. - Each layer should land as a separate story (L0 unification, manifest schema + auto-derive, L1 diff, L2 aggregate, L3 mutation, ADR + docs). Estimated effort: one mid-sized epic, 6–8 stories. - This is the canonical example of "agent-first observability": every layer optimized for machine consumption first, human consumption second.