Closes the staleness gap after the 10-commit coverage epic shipped.
Doc sync (item 1 from the user's choice):
- CLAUDE.md Quick Start: adds pnpm coverage:aggregate / coverage:diff
/ mutate to the command listing
- CLAUDE.md: new "Sibling architecture: coverage (ADR-020)" section
after the conformance gate table — captures the 4-layer table +
points at docs/guides/coverage.md + ADR-020 + says agents must run
coverage:diff before reporting complete
- AGENTS.md preamble: now lists coverage as a parallel multi-latency
quality system alongside conformance, with the same gate / latency
framing
- PRD frontmatter: status draft -> shipped + shipped date +
shipping-commits list (all 10 SHAs anchoring the trace)
- PRD findings table: each row gets a Resolution column citing the
commit that closed it; conclusion text updated to past tense
- ADR-020 implementation phasing: rewritten as a status table with
each step linked to the commit that shipped it + Boot-time
assertFeatureConformance explicitly marked Deferred with rationale
- docs/guides/coverage.md: removed "Boot wiring lands in the next
story" line; replaced with the deferral rationale + clarified
that two readers (vitest, coverage:diff) consume the manifest
Sandcastle prompts (item 2 from the user's choice):
- .sandcastle/implementer.prompt.md: new "Coverage gates" section
after the conformance-gates list, requiring `pnpm test --coverage`,
`pnpm coverage:aggregate`, and `pnpm coverage:diff` to all pass
before reporting `complete`. Machine-readable JSON shape of
coverage:diff documented (status / uncovered[] / kind enum), with
explicit instructions on how to interpret each kind. Allowlist
expansion requires justification + test.
- .sandcastle/reviewer.prompt.md: AC coverage relabeled to "AC
coverage (acceptance criteria, not test coverage)" to disambiguate;
new check #7 "Coverage gates (ADR-020)" requiring CI's
Coverage — diff (L1) step green + per-layer thresholds met +
no silent allowlist expansion + manifest band drift detection.
Effect: future agent runs through sandcastle now treat coverage as a
first-class blocking gate, parallel to conformance. PRs no longer
discover coverage failures only via CI; the implementer is required
to check before reporting done, and the reviewer is required to
verify.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 KiB
Coverage
Architecture: ADR-020. Glossary: docs/glossary.md → Coverage.
The agent-first coverage architecture has four layers. This guide is the day-to-day reference for working with them.
The four layers at a glance
| Layer | Question it answers | Command |
|---|---|---|
| L0 Per-layer vitest thresholds | "Did the last test run meet the declared bands?" | pnpm test -- --coverage |
| L1 Diff coverage | "Did this PR/slice cover its own changed lines?" | pnpm coverage:diff |
| L2 Aggregate trend | "How is coverage trending across the repo?" | pnpm coverage:aggregate → coverage/summary.json |
| L3 Mutation testing | "Do my tests actually assert anything?" | pnpm mutate (opt-in, not in default pnpm test) |
Each layer answers a distinct question. They compose, none replaces the others.
Single source of truth: feature.manifest.ts
Every feature declares its coverage expectations once, in feature.manifest.ts:
export const myFeatureManifest = defineFeature({
// ...
coverage: {
bands: {
baseline: { statements: 80, branches: 75, functions: 80, lines: 80 },
entities: { statements: 100, branches: 100, functions: 100, lines: 100 },
"use-cases": {
statements: 100,
branches: 95,
functions: 100,
lines: 100,
},
controllers: {
statements: 100,
branches: 95,
functions: 100,
lines: 100,
},
},
mutationTargets: ["entities", "use-cases"],
},
} as const);
Two readers pick this up today:
vitest.config.ts— usesvitestThresholdsFromBands(DEFAULT_COVERAGE_BANDS)from@repo/core-shared/conformance/coverage. Most features importDEFAULT_COVERAGE_BANDSdirectly (the manifest'scoveragesection matches the defaults). For features with custom bands, override at the vitest config too.pnpm coverage:diff— uses the bands for per-path expectations against the merged lcov.
(A third reader, a boot-time assertFeatureConformance coverage check, was specified in the PRD and explicitly deferred per ADR-020 — when both readers above derive from the same manifest, the drift it was supposed to catch is mechanically impossible. The manifest's coverage: field remains the declarative source of truth regardless of how many readers consume it.)
Edit the manifest. The other readers pick up the change.
Daily workflow
Before pushing
pnpm test -- --coverage # L0 — per-package thresholds enforced
pnpm coverage:aggregate # L2 — produce coverage/lcov.info + summary.json
pnpm coverage:diff # L1 — fails if changed lines aren't covered
The diff coverage step compares against origin/main by default. To compare against a different base:
pnpm coverage:diff -- --base HEAD~1
pnpm coverage:diff -- --base origin/release
For machine consumption (e.g., the agent dispatch loop):
pnpm coverage:diff -- --json | jq .uncovered
Reading a failure
pnpm coverage:diff exits with code 1 and emits both stdout (JSON) and stderr (summary):
stderr (human):
[coverage:diff] FAIL — 3 uncovered hit(s) across 2 file(s):
packages/blog/src/application/use-cases/publish-article.use-case.ts
uncovered lines: 47, 48
packages/auth/src/entities/models/session.ts
uncovered lines: 22
stdout (JSON, also written for the dispatch loop):
{
"status": "fail",
"summary": {
"filesChanged": 4,
"filesGated": 2,
"uncoveredCount": 3
},
"fileSummaries": [...],
"uncovered": [
{ "file": "...", "line": 47, "kind": "uncovered" },
{ "file": "...", "line": 48, "kind": "uncovered" },
{ "file": "...", "line": 22, "kind": "uncovered" }
]
}
kind is one of:
uncovered— line is executable per lcov, execution count is 0no-coverage-data— entire file isn't in lcov (likely a new untested file)
Fixing an uncovered slice
- Read the JSON. For each
uncoveredhit, navigate to<file>:<line>. - Identify which test would have exercised that line. Usually it's missing a branch case or an error path.
- Add the test (TDD: write failing test → make it green).
- Re-run
pnpm test --coverage --filter @repo/<feature>to verify. - Re-run
pnpm coverage:diffto confirm exit 0.
For no-coverage-data hits, write the sibling test file — vitest's ESLint conformance rule usecase-must-have-test-file will start failing anyway if you don't.
What's exempt (the allowlist)
The diff coverage gate skips:
- Test files (
*.test.ts,*.test.tsx,*.test.mjs) - Fixtures, factories, contracts, seeds (
__fixtures__/,__factories__/,__contracts__/,__seeds__/) - Config files (
*.config.{ts,js,mjs,cjs},package.json,tsconfig.*.json,turbo.json) - Docs and data (
*.md,*.json,*.yaml,.gitignore,.npmrc) - Shell scripts (
*.sh,*.bash) - Dev tooling under
scripts/andturbo/generators/ - Per-feature excludes mirrored from vitest (
di/bind-production.ts,application/repositories/**,application/services/**,integrations/cms/**,ui/**,*.interface.ts,index.tsbarrels) - Build artifacts (
dist/,.next/,.turbo/,node_modules/,coverage/)
The allowlist lives in scripts/coverage/diff.mjs and is unit-tested.
Adjusting bands
To raise the bar on a feature
Edit packages/<feature>/src/feature.manifest.ts:
coverage: {
bands: {
baseline: { statements: 90, branches: 85, functions: 90, lines: 90 }, // tighter
entities: { statements: 100, branches: 100, functions: 100, lines: 100 },
"use-cases": { statements: 100, branches: 100, functions: 100, lines: 100 }, // bumped branches
controllers: { statements: 100, branches: 95, functions: 100, lines: 100 },
},
}
If the new bands are stricter than the defaults, also update packages/<feature>/vitest.config.ts to use vitestThresholdsFromManifest(myFeatureManifest) instead of DEFAULT_COVERAGE_BANDS. (Note: importing the manifest from a vitest config has tooling constraints — see the DEFAULT_COVERAGE_BANDS route as the default path.)
To skip a layer
Omit it from bands. The layer falls through to baseline:
coverage: {
bands: {
baseline: { ... },
entities: { ... },
// controllers omitted -> matches baseline
},
}
CI behavior
.github/workflows/ci.yml (validate job) runs three coverage steps after the test step:
- Test with coverage — produces per-package
coverage/lcov.info - Coverage — aggregate (L2) — merges to root
coverage/lcov.info+coverage/summary.json - Coverage — diff (L1) — only on pull requests, diffs against
origin/<base-ref>
On merge to main, .github/workflows/coverage-snapshot.yml re-aggregates and commits the updated coverage/summary.json back to main. Trend history accumulates via git log -- coverage/summary.json.
Reading the trend
git log --oneline --follow -- coverage/summary.json | head -10
git show <sha> -- coverage/summary.json | grep -E '"statements"|"branches"'
coverage/summary.json is the only committed coverage artifact. Each snapshot includes:
generatedAt— ISO timestampcommit— short SHArepo— repo-wide percentages + raw countsbyPackage— per-package percentages, keyed by@repo/<name>
Mutation testing (L3)
Stryker mutation testing on entities/ + application/use-cases/ — the pure-business-logic surface. Not part of pnpm test (slow); runs on-demand and nightly via GH Action.
Running
pnpm mutate # every feature with a stryker.config.json
pnpm mutate -- --filter @repo/auth # one feature
pnpm mutate -- --since main # incremental against base ref
pnpm mutate -- --json # machine-readable summary
Configuration
Each feature has a slim stryker.config.json that extends the shared base:
{
"$schema": "../../node_modules/@stryker-mutator/core/schema/stryker-schema.json",
"extends": "@repo/core-testing/stryker.base.json"
}
The base lives at packages/core-testing/stryker.base.json and defines:
- Test runner: vitest (uses each feature's
vitest.config.ts) - Scope:
src/entities/**/*.tsandsrc/application/use-cases/**/*.ts(excludes tests/factories/contracts) - Thresholds: high 90 / low 80 / break 80 (
breakis the fail threshold) - Reporters: progress, html (
reports/mutation/index.html), json (reports/mutation/mutation.json) - Incremental mode: enabled (subsequent runs skip mutants whose source + tests haven't changed)
- Concurrency: 4 workers
To override per feature (rare), add fields to the feature's stryker.config.json:
{
"extends": "@repo/core-testing/stryker.base.json",
"thresholds": { "high": 95, "low": 85, "break": 85 },
"mutate": ["src/entities/**/*.ts"]
}
CI: nightly run + on-demand
.github/workflows/mutation-nightly.yml runs Stryker across every feature at 02:30 UTC + on workflow_dispatch. The dispatch UI accepts a filter input (e.g. @repo/auth) for targeted reruns. Reports uploaded as the mutation-reports artifact (30-day retention). On meaningful score drops it opens a tracking issue labelled mutation-testing.
What you're looking for
Stryker's mutation.json reports the mutation score (killed mutants / total) per file. A surviving mutant means: the mutator changed source code (e.g., < → <=, && → ||, removed a line, etc.), reran the tests, and they STILL passed. That's a test that exists + executes the code but doesn't actually assert behavior.
Fix: read the surviving mutant's diff in reports/mutation/index.html, identify the assertion that should have caught it, add the assertion.
Troubleshooting
"Cannot find module '@vitest/coverage-v8'" — your feature's package.json is missing @vitest/coverage-v8 as a dev dep. Add it. (This was the issue surfaced for media during the L0 audit.)
"Coverage for lines (X%) does not meet 'src/...' threshold (Y%)" — L0 failure. Real test gap. Either write the missing test or adjust the manifest band downward (rare; band relaxation should be justified).
pnpm coverage:diff says "lcov file not found" — run pnpm test -- --coverage && pnpm coverage:aggregate first. The diff script reads the merged root coverage/lcov.info.
coverage/summary.json differs every commit — expected. It includes generatedAt (ISO timestamp) and commit (SHA). The snapshot workflow only commits it when the underlying numbers change; in local dev, regenerating it shows diff noise.
Diff coverage flags a file I don't think should be gated — check the allowlist in scripts/coverage/diff.mjs. If the file genuinely shouldn't be gated, extend the allowlist (and the tests in diff.test.mjs).
Related
- ADR-020 — full architectural rationale
- ADR-011 — original TDD foundation (the thresholds originated here)
- PRD 2026-05-13-coverage-architecture — implementation seed with audit findings
- docs/glossary.md — canonical vocabulary
- docs/guides/conformance-quickref.md — sibling reference for the 5-gate conformance system