---
id: 2026-05-13-coverage-architecture
title: Agent-first coverage architecture (4 layers + manifest-driven thresholds)
type: prd
status: shipped
author: danijel
elicitation-session: brainstorm-2026-05-13
created: 2026-05-13
shipped: 2026-05-13
shipping-commits:
  - 7eb783a (PRD)
  - 4dce1df (ADR-020 + glossary + hook)
  - f7baa8b (manifest schema + helper + auth)
  - 412d994 (L1 coverage:diff)
  - bd5a077 (L2 coverage:aggregate)
  - 39e33eb (CI integration)
  - 15db9c4 (helper rollout blog + marketing-pages)
  - f4254aa (cookbook guide + generator)
  - 6428f10 (L3 Stryker mutation)
  - bf0b049 (L0 unification — all 5 features green)
---

## Problem

The template enforces "every use case has a test file" via the `usecase-must-have-test-file` ESLint rule (structural) and declares per-layer coverage thresholds in each feature's `vitest.config.ts` (100% on entities/use-cases/controllers; 80/75/80/80 baseline). But:

- **Agents can ship slices that don't actually exercise the new code.** The ESLint rule only checks file presence; coverage % checks what executed. There's no gate that says "this PR's diff was tested."
- **The declared 100%-on-critical-layers thresholds may be aspirational, not enforced.** Their actual green/red state is unverified. CI uploads `**/coverage/lcov.info` as an artifact but doesn't gate on it.
- **There's no aggregate visibility.** No merged report, no trend, no "is the codebase covered well right now?" answer.
- **100% coverage with weak assertions is invisible.** Tests that import the SUT but barely assert anything pass coverage. The third dimension of test quality — "would my test catch a real regression?" — isn't measured.
- **Coverage expectations live in 5 separate vitest configs.** Drift across features is easy and hard to spot.

For an agent-first template where most code is authored by AI agents in vertical slices, this is the single biggest gap between "feature shipped" and "feature shipped safely."

## Goal

Establish a 4-layer coverage architecture that mirrors the existing 5-gate conformance philosophy (multi-latency, machine-readable, agent-first) and makes coverage a first-class conformance signal driven from each feature's `feature.manifest.ts`.

## In scope

- A `coverage:` section in `feature.manifest.ts` as the single source of truth for per-layer expectations
- Vitest config auto-derives test-time thresholds from the manifest
- `pnpm coverage:diff` script — cover-the-diff gate (changed lines must be exercised), machine-readable output for the dispatch loop
- `pnpm coverage:aggregate` script — merges per-package lcov into a root `coverage/lcov.info` + grep-able `coverage/summary.json`
- `coverage/summary.json` committed per merge; trend readable from `git log`
- `pnpm mutate` — Stryker on `entities/` + `application/use-cases/` only, on-demand (not part of `pnpm test`)
- `assertFeatureConformance` reads the manifest's coverage band and the package's lcov at boot; fails if drift
- CI gate: `pnpm test --coverage` (existing) + `pnpm coverage:diff` (new) + `pnpm coverage:aggregate` (new)
- ADR-020 capturing the architecture
- `docs/guides/coverage.md` cookbook
- Glossary entries for new vocabulary
- Generator update — `pnpm turbo gen feature` scaffolds the `coverage:` manifest section
- `.claude/hooks/prompt-context.sh` detects coverage-related prompts and injects pointers

## Out of scope

- **Codecov / SaaS dashboards.** Aggregate trend ships as committed `coverage/summary.json` only. SaaS can be a later ADR.
- **Coverage badges or PR comments.** Optional follow-up if humans want them; agents don't.
- **Mutation testing on infrastructure / repositories / controllers** — entities + use-cases only for v1. Wider scope is a future epic.
- **Branch coverage on `__seeds__/` / `__factories__/` / `__contracts__/`** — already excluded from coverage; stays out.
- **Coverage for `apps/`** — they have their own existing thresholds; not part of this initiative.
- **Test quality beyond coverage + mutation** — property-based tests, fuzzing, etc. are separate concerns.

## Constraints

- ADR-014, ADR-017 (instrumentation): coverage instrumentation must not interfere with span/log collection.
- ADR-011 (TDD foundation): the declared per-layer thresholds (entities/use-cases/controllers at 100%; baseline 80/75/80/80) are existing decisions; we honor them and centralize their declaration.
- The conformance ESLint rules are AST-time + filesystem; coverage assertions are runtime data — they must remain in separate gates.
- Generator-first is non-negotiable: any new file added to a feature must come from `pnpm turbo gen feature` or a sibling generator.
- `pnpm conformance` and `pnpm test` already take ~120s and ~90s respectively; new gates must not add more than ~30s wall time to the default loop.
- Agent dispatch (`pnpm work dispatch`) must be able to read coverage results as JSON without a network call.

## Success criteria

- `pnpm test --coverage` passes green across all five feature packages (verifies L0 baseline is real, not aspirational).
- `pnpm coverage:diff` exits non-zero when a changed line is uncovered; outputs JSON to stdout listing each uncovered hunk with file + line range.
- `pnpm coverage:aggregate` produces `coverage/lcov.info` + `coverage/summary.json` at the repo root; both readable in <1s.
- `coverage/summary.json` is committed and `git log -- coverage/summary.json` shows trend over time.
- `pnpm mutate --filter <feature>` runs Stryker on entities + use-cases; produces a per-feature mutation score.
- `feature.manifest.ts` includes `coverage: { ... }`; removing or weakening a band fails `pnpm dev` boot via `assertFeatureConformance`.
- CI fails when (a) a per-layer threshold breach happens, (b) any changed line is uncovered, (c) the aggregate report can't be produced.
- ADR-020 + `docs/guides/coverage.md` + glossary entries land alongside implementation.
- `pnpm turbo gen feature` emits a manifest with the `coverage:` section pre-populated to defaults.

## User stories

1. As an **AI implementer agent**, I want `pnpm coverage:diff` to exit with a precise JSON list of uncovered hunks, so that I can immediately add the missing test without searching.
2. As an **AI implementer agent**, I want `pnpm dev` to refuse to boot if I've authored a manifest entry without backing tests, so that I cannot ship an untested slice.
3. As an **AI reviewer agent**, I want the dispatch loop to surface coverage drift as part of its post-task verification, so that I can flag the slice for revision.
4. As a **human reviewer**, I want `coverage/summary.json` committed on merge, so that I can see coverage trend via `git log` without leaving the repo.
5. As a **template maintainer**, I want each feature's `feature.manifest.ts` to declare its coverage band, so that I have one place to read or change expectations.
6. As a **template maintainer**, I want `pnpm mutate` to surface tests that don't actually assert, so that 100% line coverage can't paper over weak tests.
7. As a **template adopter**, I want `pnpm turbo gen feature` to scaffold the `coverage:` manifest section with sensible defaults, so that I don't have to remember the shape.
8. As a **future agent in a session**, I want the prompt-context hook to inject coverage pointers when I mention "coverage" or "uncovered" in a prompt, so that the relevant ADR + guide load automatically.
9. As an **on-call engineer**, I want `coverage/summary.json` to include a timestamp + commit SHA, so that I can correlate coverage state with deploys.
10. As an **AI agent receiving a handoff**, I want the coverage state of the in-flight slice to be readable from a known path, so that I can continue without re-running the suite.

## Implementation decisions

### Architecture: 4 layers, mirroring the 5-gate conformance philosophy

| Layer                              | What it catches                                       | Latency            | Runs in                                                      |
| ---------------------------------- | ----------------------------------------------------- | ------------------ | ------------------------------------------------------------ |
| **L0** Per-layer vitest thresholds | Drift below declared bands (e.g., entities < 100%)    | ~5–30s per package | `pnpm test --coverage` (existing)                            |
| **L1** Diff coverage               | "Changed line was not exercised"                      | ~5s after L0       | `pnpm coverage:diff`; CI gate; dispatch post-task            |
| **L2** Aggregate trend             | "Codebase coverage drifted over time"                 | ~10s               | `pnpm coverage:aggregate`; committed `coverage/summary.json` |
| **L3** Mutation testing            | "Test exists, executes the code, but asserts nothing" | Minutes            | `pnpm mutate`; on-demand; nightly GH Action                  |

### Manifest-driven coverage band (the keystone)

A `coverage:` section in `feature.manifest.ts` is the single source of truth. Default scaffolded shape:

- `entities`: 100/100/100/100
- `use-cases`: 100/95/100/100
- `controllers`: 100/95/100/100
- `baseline`: 80/75/80/80
- `mutationTargets`: `["entities", "use-cases"]`

Three readers:

1. **Vitest** — `vitest.config.ts` imports the manifest and emits its `coverage.thresholds`. Eliminates the duplication today.
2. **`assertFeatureConformance`** — reads `coverage/lcov.info` for the package at boot, asserts each band. Fails to boot on drift (matching the existing brand-assertion shape).
3. **`pnpm coverage:diff`** — uses `baseline` as the default expectation for non-layer-tagged files; stricter bands override per-path.

### Diff coverage algorithm

`pnpm coverage:diff [<base-ref>]` (default `origin/main...HEAD`):

1. Run `git diff --unified=0 --no-color <base>...HEAD` → list of `(file, hunk-start, hunk-end)` for changed/added lines.
2. Read merged `coverage/lcov.info` → executable-line + execution-count per file.
3. For each changed line that is _executable_ (skip comments, types, exports, declarations): assert execution count > 0.
4. Output stdout JSON `{ status: "pass" | "fail", uncovered: [{ file, line, kind }] }`.
5. Output stderr human-readable summary.
6. Exit 0 on pass, 1 on fail.

Filename allowlist (don't gate on diff coverage): `*.test.ts`, `*.test.tsx`, `*.config.*`, `*.md`, `*.json`, `*.mjs`, `*.cjs`. Same exclude list as L0's existing per-package coverage excludes (DI bootstrap, interfaces, CMS, UI, factories, contracts).

### Aggregate report

`pnpm coverage:aggregate`:

1. Collect every `packages/**/coverage/lcov.info` + `apps/**/coverage/lcov.info`.
2. Merge into `coverage/lcov.info` via `lcov-result-merger` or `monocart-coverage-reports`.
3. Emit `coverage/summary.json`:
   ```json
   {
     "generatedAt": "2026-05-13T12:34:56Z",
     "commit": "abc1234",
     "repo": { "statements": 87.4, "branches": 81.2, "functions": 89.0, "lines": 87.4 },
     "byPackage": {
       "@repo/auth": { "statements": 96.1, ... },
       ...
     }
   }
   ```
4. Re-render HTML at `coverage/html/index.html` for human drill-down (gitignored).

`coverage/summary.json` is committed; `coverage/lcov.info` + `coverage/html/` are gitignored.

### Mutation testing

`pnpm mutate [--filter @repo/<feature>]`:

- Uses Stryker with `@stryker-mutator/vitest-runner`.
- Per-feature `stryker.config.json` scaffolded by the generator.
- Mutation scope honors the manifest's `mutationTargets` (entities + use-cases by default).
- Output: `reports/mutation/<feature>/` (HTML + JSON; gitignored).
- Mutation score threshold: 80% per feature (tunable in manifest); not enforced in default `pnpm test`.
- Optional nightly GH Action: `mutation-nightly.yml` — runs across all features, opens an issue on score drop > 5%.

### Boot-time assertion

`assertFeatureConformance` (in `core-shared/conformance/`) gains a new check:

1. Look for `coverage/lcov.info` at the feature package root.
2. If absent: skip with a `logger.debug(...)` (dev mode is allowed to lack coverage data).
3. If present: parse, compute per-layer % for `entities/`, `application/use-cases/`, `interface-adapters/controllers/`.
4. Compare against the manifest's `coverage:` bands. Fail boot on breach with a `CoverageDriftError`.

Graceful degradation in dev (`USE_DEV_SEED=true`): assertion logs warning instead of throwing, so `pnpm dev` boots without a fresh coverage run.

### CI integration

`.github/workflows/ci.yml` gains three steps after the existing test step:

1. `pnpm coverage:aggregate` (always)
2. `pnpm coverage:diff` (always; fails build on uncovered diff)
3. Upload `coverage/lcov.info` + `coverage/summary.json` as artifact (existing flow)
4. On merge to main only: commit `coverage/summary.json` back to the repo (separate workflow with `permissions: contents: write`)

### Generator integration

`pnpm turbo gen feature` template emits:

- Manifest with `coverage: { ... defaults ... }` section at a `<gen:coverage>` anchor
- `vitest.config.ts` that imports the manifest and derives `coverage.thresholds` from it
- `stryker.config.json` at the package root, scoped to `entities/` + `application/use-cases/`

The CI guard at `packages/core-eslint/anchors.test.js` adds the new anchor.

### Hook integration

`.claude/hooks/prompt-context.sh` gains a new keyword group:

```
if echo "$prompt" | grep -qE 'coverage|uncovered|mutation|stryker|lcov'; then
  inject+=('Coverage: ADR-020 + docs/guides/coverage.md. 4 layers: L0 vitest thresholds, L1 pnpm coverage:diff, L2 coverage/summary.json, L3 pnpm mutate. Manifest-driven via feature.manifest.ts coverage section.')
fi
```

## Testing decisions

A good test for this initiative covers behavior through the public surface:

- The diff-coverage script gets tested by feeding it a synthetic `git diff` + `lcov.info` and asserting JSON output shape and exit code. Fixtures, not e2e.
- The aggregate script gets unit-tested by feeding it N synthetic per-package lcov files and asserting the merged output structure.
- `assertFeatureConformance` gets a new test in `core-shared/conformance/` that constructs a manifest + a `coverage/lcov.info` fixture and asserts pass/fail behavior. No real test run.
- Stryker integration gets an integration-test style verification: run `pnpm mutate --filter @repo/auth` in CI nightly and assert mutation score > 80% on a known-good commit.
- The manifest schema gets a Zod schema test: invalid `coverage:` shapes fail parse with a specific error message.

**Modules to add test coverage to:**

- `scripts/coverage/diff.mjs` (the diff coverage runner)
- `scripts/coverage/aggregate.mjs` (the merger)
- `packages/core-shared/src/conformance/assert-coverage.ts` (the boot-time reader + comparator)
- `packages/core-shared/src/conformance/manifest-schema.ts` (extended Zod schema)

**Prior art to mirror:**

- `scripts/work/state-builder.mjs` — for the diff-coverage script's shape (Node ESM, no deps, fixture-based test)
- `packages/core-eslint/anchors.test.js` — for the new anchor's CI guard pattern
- Existing `assertFeatureConformance` brand-assertion code — for the boot-time check shape

## Open questions

- **Q1: Mutation score threshold per feature** — start at 80% across the board, or tune per-feature in the manifest? **Recommended:** start at 80% baseline, allow per-feature override via `coverage.mutationThreshold`.
- **Q2: When `coverage/lcov.info` is stale at boot** — fail loudly or warn? **Recommended:** warn in dev, fail in `NODE_ENV=production` boot.
- **Q3: Should `pnpm coverage:diff` run automatically as a Git pre-push hook?** **Recommended:** no — pre-commit already runs; pre-push adds latency without proportional value. Keep it CI-only + dispatch-loop.
- **Q4: How is `coverage/summary.json` committed safely on merge to main?** **Recommended:** separate workflow with a github-actions[bot] identity, `permissions: contents: write`, skipped if no diff in summary.
- **Q5: `monocart-coverage-reports` vs. `lcov-result-merger`?** **Recommended:** monocart — actively maintained, single-binary, handles V8 + Istanbul, no Python dep.

## Out of scope (deferred to future PRDs)

- **Codecov / SaaS dashboards.** Aggregate is committed; SaaS is gold-plating.
- **Mutation testing on `infrastructure/`, `interface-adapters/`, `integrations/`.** Bigger surface, slower runs; revisit after L3 v1 is stable.
- **Coverage-aware Storybook screenshot diffing.** Visual regression is its own concern.
- **Code review heuristics from coverage** (e.g., "this PR touched a 100%-covered file and dropped to 95% — needs review tag"). Possible follow-up via fallow audit.

## L0 verification findings (2026-05-13, during brainstorm)

Ran `pnpm --filter @repo/<x> test -- --coverage --run` per feature. State of the declared 100%-on-entities/use-cases/controllers:

| Feature                 | State                | Detail                                                                                                                                                                                                                                                                                                          |
| ----------------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `@repo/auth`            | ✅ green             | 21 tests, 93.7% overall, all per-layer bands hit 100%                                                                                                                                                                                                                                                           |
| `@repo/blog`            | ✅ green             | passed (no threshold errors surfaced)                                                                                                                                                                                                                                                                           |
| `@repo/marketing-pages` | ✅ green             | passed                                                                                                                                                                                                                                                                                                          |
| `@repo/navigation`      | ❌ real gap          | `entities/`: 86.36% lines / 50% functions; `controllers/`: 86.66% lines / 80% branches                                                                                                                                                                                                                          |
| `@repo/media`           | ❌ config + real gap | (1) Missing `@vitest/coverage-v8` dev dep — `--coverage` crashed. (2) `vitest.config.ts` had NO coverage block at all (no per-layer thresholds, no excludes). When the standard block was applied, real gaps surfaced in `controllers/` (one controller at 86.66% lines / 75% branches, lines 19-20 uncovered). |

**Conclusions reinforcing the PRD:**

- The L0 layer is real (auth/blog/marketing-pages prove the 100%/100%/95%/100% bar is achievable).
- The duplication problem is real (5 features, 4 different vitest configs, one entirely absent the coverage block — exactly the drift the manifest-driven keystone eliminates).
- Real test gaps exist in 2 of 5 features (navigation, media). Fixing them is part of the implementation epic, not a blocker for this design landing.

**Patches landed alongside the PRD (non-blocking for CI):**

- `packages/media/package.json` — added `@vitest/coverage-v8` dev dep so `pnpm test -- --coverage` no longer crashes.
- `packages/media/vitest.config.ts` — left intentionally minimal (no coverage block); commented to point at the L0 unification story.

The full per-layer block for media + the missing tests in navigation + media land in the **L0 unification** story of the implementation epic, as the first work item after the manifest schema lands.

## Further notes

- Builds on: ADR-011 (TDD foundation), ADR-006 (vertical-feature-packages), the existing 5-gate conformance system.
- ADR-020 is the durable record of the architecture decisions; this PRD is the implementation seed.
- Each layer should land as a separate story (L0 unification, manifest schema + auto-derive, L1 diff, L2 aggregate, L3 mutation, ADR + docs). Estimated effort: one mid-sized epic, 6–8 stories.
- This is the canonical example of "agent-first observability": every layer optimized for machine consumption first, human consumption second.