Files
agentic-dev/docs/guides/coverage.md
Danijel Martinek fc27eef6eb docs(coverage): sync docs to shipped state + wire sandcastle prompts
Closes the staleness gap after the 10-commit coverage epic shipped.

Doc sync (item 1 from the user's choice):
  - CLAUDE.md Quick Start: adds pnpm coverage:aggregate / coverage:diff
    / mutate to the command listing
  - CLAUDE.md: new "Sibling architecture: coverage (ADR-020)" section
    after the conformance gate table — captures the 4-layer table +
    points at docs/guides/coverage.md + ADR-020 + says agents must run
    coverage:diff before reporting complete
  - AGENTS.md preamble: now lists coverage as a parallel multi-latency
    quality system alongside conformance, with the same gate / latency
    framing
  - PRD frontmatter: status draft -> shipped + shipped date +
    shipping-commits list (all 10 SHAs anchoring the trace)
  - PRD findings table: each row gets a Resolution column citing the
    commit that closed it; conclusion text updated to past tense
  - ADR-020 implementation phasing: rewritten as a status table with
    each step linked to the commit that shipped it + Boot-time
    assertFeatureConformance explicitly marked Deferred with rationale
  - docs/guides/coverage.md: removed "Boot wiring lands in the next
    story" line; replaced with the deferral rationale + clarified
    that two readers (vitest, coverage:diff) consume the manifest

Sandcastle prompts (item 2 from the user's choice):
  - .sandcastle/implementer.prompt.md: new "Coverage gates" section
    after the conformance-gates list, requiring `pnpm test --coverage`,
    `pnpm coverage:aggregate`, and `pnpm coverage:diff` to all pass
    before reporting `complete`. Machine-readable JSON shape of
    coverage:diff documented (status / uncovered[] / kind enum), with
    explicit instructions on how to interpret each kind. Allowlist
    expansion requires justification + test.
  - .sandcastle/reviewer.prompt.md: AC coverage relabeled to "AC
    coverage (acceptance criteria, not test coverage)" to disambiguate;
    new check #7 "Coverage gates (ADR-020)" requiring CI's
    Coverage — diff (L1) step green + per-layer thresholds met +
    no silent allowlist expansion + manifest band drift detection.

Effect: future agent runs through sandcastle now treat coverage as a
first-class blocking gate, parallel to conformance. PRs no longer
discover coverage failures only via CI; the implementer is required
to check before reporting done, and the reviewer is required to
verify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 16:47:16 +02:00

271 lines
12 KiB
Markdown

# Coverage
> **Architecture:** [ADR-020](../decisions/adr-020-coverage-architecture.md). **Glossary:** [docs/glossary.md → Coverage](../glossary.md#coverage).
The agent-first coverage architecture has four layers. This guide is the day-to-day reference for working with them.
## The four layers at a glance
| Layer | Question it answers | Command |
| ---------------------------------- | ------------------------------------------------ | ---------------------------------------------------- |
| **L0** Per-layer vitest thresholds | "Did the last test run meet the declared bands?" | `pnpm test -- --coverage` |
| **L1** Diff coverage | "Did this PR/slice cover its own changed lines?" | `pnpm coverage:diff` |
| **L2** Aggregate trend | "How is coverage trending across the repo?" | `pnpm coverage:aggregate``coverage/summary.json` |
| **L3** Mutation testing | "Do my tests actually assert anything?" | `pnpm mutate` _(opt-in, not in default `pnpm test`)_ |
Each layer answers a distinct question. They compose, none replaces the others.
## Single source of truth: `feature.manifest.ts`
Every feature declares its coverage expectations once, in `feature.manifest.ts`:
```ts
export const myFeatureManifest = defineFeature({
// ...
coverage: {
bands: {
baseline: { statements: 80, branches: 75, functions: 80, lines: 80 },
entities: { statements: 100, branches: 100, functions: 100, lines: 100 },
"use-cases": {
statements: 100,
branches: 95,
functions: 100,
lines: 100,
},
controllers: {
statements: 100,
branches: 95,
functions: 100,
lines: 100,
},
},
mutationTargets: ["entities", "use-cases"],
},
} as const);
```
Two readers pick this up today:
1. **`vitest.config.ts`** — uses `vitestThresholdsFromBands(DEFAULT_COVERAGE_BANDS)` from `@repo/core-shared/conformance/coverage`. Most features import `DEFAULT_COVERAGE_BANDS` directly (the manifest's `coverage` section matches the defaults). For features with custom bands, override at the vitest config too.
2. **`pnpm coverage:diff`** — uses the bands for per-path expectations against the merged lcov.
(A third reader, a boot-time `assertFeatureConformance` coverage check, was specified in the PRD and explicitly deferred per ADR-020 — when both readers above derive from the same manifest, the drift it was supposed to catch is mechanically impossible. The manifest's `coverage:` field remains the declarative source of truth regardless of how many readers consume it.)
**Edit the manifest. The other readers pick up the change.**
## Daily workflow
### Before pushing
```bash
pnpm test -- --coverage # L0 — per-package thresholds enforced
pnpm coverage:aggregate # L2 — produce coverage/lcov.info + summary.json
pnpm coverage:diff # L1 — fails if changed lines aren't covered
```
The diff coverage step compares against `origin/main` by default. To compare against a different base:
```bash
pnpm coverage:diff -- --base HEAD~1
pnpm coverage:diff -- --base origin/release
```
For machine consumption (e.g., the agent dispatch loop):
```bash
pnpm coverage:diff -- --json | jq .uncovered
```
### Reading a failure
`pnpm coverage:diff` exits with code 1 and emits both stdout (JSON) and stderr (summary):
**stderr** (human):
```
[coverage:diff] FAIL — 3 uncovered hit(s) across 2 file(s):
packages/blog/src/application/use-cases/publish-article.use-case.ts
uncovered lines: 47, 48
packages/auth/src/entities/models/session.ts
uncovered lines: 22
```
**stdout** (JSON, also written for the dispatch loop):
```json
{
"status": "fail",
"summary": {
"filesChanged": 4,
"filesGated": 2,
"uncoveredCount": 3
},
"fileSummaries": [...],
"uncovered": [
{ "file": "...", "line": 47, "kind": "uncovered" },
{ "file": "...", "line": 48, "kind": "uncovered" },
{ "file": "...", "line": 22, "kind": "uncovered" }
]
}
```
`kind` is one of:
- `uncovered` — line is executable per lcov, execution count is 0
- `no-coverage-data` — entire file isn't in lcov (likely a new untested file)
### Fixing an uncovered slice
1. Read the JSON. For each `uncovered` hit, navigate to `<file>:<line>`.
2. Identify which test would have exercised that line. Usually it's missing a branch case or an error path.
3. Add the test (TDD: write failing test → make it green).
4. Re-run `pnpm test --coverage --filter @repo/<feature>` to verify.
5. Re-run `pnpm coverage:diff` to confirm exit 0.
For `no-coverage-data` hits, write the sibling test file — vitest's ESLint conformance rule `usecase-must-have-test-file` will start failing anyway if you don't.
### What's exempt (the allowlist)
The diff coverage gate skips:
- Test files (`*.test.ts`, `*.test.tsx`, `*.test.mjs`)
- Fixtures, factories, contracts, seeds (`__fixtures__/`, `__factories__/`, `__contracts__/`, `__seeds__/`)
- Config files (`*.config.{ts,js,mjs,cjs}`, `package.json`, `tsconfig.*.json`, `turbo.json`)
- Docs and data (`*.md`, `*.json`, `*.yaml`, `.gitignore`, `.npmrc`)
- Shell scripts (`*.sh`, `*.bash`)
- Dev tooling under `scripts/` and `turbo/generators/`
- Per-feature excludes mirrored from vitest (`di/bind-production.ts`, `application/repositories/**`, `application/services/**`, `integrations/cms/**`, `ui/**`, `*.interface.ts`, `index.ts` barrels)
- Build artifacts (`dist/`, `.next/`, `.turbo/`, `node_modules/`, `coverage/`)
The allowlist lives in `scripts/coverage/diff.mjs` and is unit-tested.
## Adjusting bands
### To raise the bar on a feature
Edit `packages/<feature>/src/feature.manifest.ts`:
```ts
coverage: {
bands: {
baseline: { statements: 90, branches: 85, functions: 90, lines: 90 }, // tighter
entities: { statements: 100, branches: 100, functions: 100, lines: 100 },
"use-cases": { statements: 100, branches: 100, functions: 100, lines: 100 }, // bumped branches
controllers: { statements: 100, branches: 95, functions: 100, lines: 100 },
},
}
```
If the new bands are stricter than the defaults, also update `packages/<feature>/vitest.config.ts` to use `vitestThresholdsFromManifest(myFeatureManifest)` instead of `DEFAULT_COVERAGE_BANDS`. _(Note: importing the manifest from a vitest config has tooling constraints — see the `DEFAULT_COVERAGE_BANDS` route as the default path.)_
### To skip a layer
Omit it from `bands`. The layer falls through to `baseline`:
```ts
coverage: {
bands: {
baseline: { ... },
entities: { ... },
// controllers omitted -> matches baseline
},
}
```
## CI behavior
`.github/workflows/ci.yml` (validate job) runs three coverage steps after the test step:
1. **Test with coverage** — produces per-package `coverage/lcov.info`
2. **Coverage — aggregate (L2)** — merges to root `coverage/lcov.info` + `coverage/summary.json`
3. **Coverage — diff (L1)** — only on pull requests, diffs against `origin/<base-ref>`
On merge to main, `.github/workflows/coverage-snapshot.yml` re-aggregates and commits the updated `coverage/summary.json` back to main. Trend history accumulates via `git log -- coverage/summary.json`.
## Reading the trend
```bash
git log --oneline --follow -- coverage/summary.json | head -10
git show <sha> -- coverage/summary.json | grep -E '"statements"|"branches"'
```
`coverage/summary.json` is the only committed coverage artifact. Each snapshot includes:
- `generatedAt` — ISO timestamp
- `commit` — short SHA
- `repo` — repo-wide percentages + raw counts
- `byPackage` — per-package percentages, keyed by `@repo/<name>`
## Mutation testing (L3)
Stryker mutation testing on `entities/` + `application/use-cases/` — the pure-business-logic surface. Not part of `pnpm test` (slow); runs on-demand and nightly via GH Action.
### Running
```bash
pnpm mutate # every feature with a stryker.config.json
pnpm mutate -- --filter @repo/auth # one feature
pnpm mutate -- --since main # incremental against base ref
pnpm mutate -- --json # machine-readable summary
```
### Configuration
Each feature has a slim `stryker.config.json` that extends the shared base:
```json
{
"$schema": "../../node_modules/@stryker-mutator/core/schema/stryker-schema.json",
"extends": "@repo/core-testing/stryker.base.json"
}
```
The base lives at `packages/core-testing/stryker.base.json` and defines:
- **Test runner**: vitest (uses each feature's `vitest.config.ts`)
- **Scope**: `src/entities/**/*.ts` and `src/application/use-cases/**/*.ts` (excludes tests/factories/contracts)
- **Thresholds**: high 90 / low 80 / break 80 (`break` is the fail threshold)
- **Reporters**: progress, html (`reports/mutation/index.html`), json (`reports/mutation/mutation.json`)
- **Incremental mode**: enabled (subsequent runs skip mutants whose source + tests haven't changed)
- **Concurrency**: 4 workers
To override per feature (rare), add fields to the feature's `stryker.config.json`:
```json
{
"extends": "@repo/core-testing/stryker.base.json",
"thresholds": { "high": 95, "low": 85, "break": 85 },
"mutate": ["src/entities/**/*.ts"]
}
```
### CI: nightly run + on-demand
`.github/workflows/mutation-nightly.yml` runs Stryker across every feature at 02:30 UTC + on `workflow_dispatch`. The dispatch UI accepts a `filter` input (e.g. `@repo/auth`) for targeted reruns. Reports uploaded as the `mutation-reports` artifact (30-day retention). On meaningful score drops it opens a tracking issue labelled `mutation-testing`.
### What you're looking for
Stryker's `mutation.json` reports the **mutation score** (killed mutants / total) per file. A surviving mutant means: the mutator changed source code (e.g., `<``<=`, `&&``||`, removed a line, etc.), reran the tests, and they STILL passed. That's a test that exists + executes the code but doesn't actually assert behavior.
Fix: read the surviving mutant's diff in `reports/mutation/index.html`, identify the assertion that should have caught it, add the assertion.
## Troubleshooting
**"Cannot find module '@vitest/coverage-v8'"** — your feature's `package.json` is missing `@vitest/coverage-v8` as a dev dep. Add it. (This was the issue surfaced for media during the L0 audit.)
**"Coverage for lines (X%) does not meet 'src/...' threshold (Y%)"** — L0 failure. Real test gap. Either write the missing test or adjust the manifest band downward (rare; band relaxation should be justified).
**`pnpm coverage:diff` says "lcov file not found"** — run `pnpm test -- --coverage && pnpm coverage:aggregate` first. The diff script reads the merged root `coverage/lcov.info`.
**`coverage/summary.json` differs every commit** — expected. It includes `generatedAt` (ISO timestamp) and `commit` (SHA). The snapshot workflow only commits it when the underlying numbers change; in local dev, regenerating it shows diff noise.
**Diff coverage flags a file I don't think should be gated** — check the allowlist in `scripts/coverage/diff.mjs`. If the file genuinely shouldn't be gated, extend the allowlist (and the tests in `diff.test.mjs`).
## Related
- [ADR-020](../decisions/adr-020-coverage-architecture.md) — full architectural rationale
- [ADR-011](../decisions/adr-011-tdd-foundation.md) — original TDD foundation (the thresholds originated here)
- [PRD 2026-05-13-coverage-architecture](../work/prds/2026-05-13-coverage-architecture.prd.md) — implementation seed with audit findings
- [docs/glossary.md](../glossary.md) — canonical vocabulary
- [docs/guides/conformance-quickref.md](./conformance-quickref.md) — sibling reference for the 5-gate conformance system