Files

Danijel Martinek fc27eef6eb docs(coverage): sync docs to shipped state + wire sandcastle prompts

Closes the staleness gap after the 10-commit coverage epic shipped.

Doc sync (item 1 from the user's choice):
  - CLAUDE.md Quick Start: adds pnpm coverage:aggregate / coverage:diff
    / mutate to the command listing
  - CLAUDE.md: new "Sibling architecture: coverage (ADR-020)" section
    after the conformance gate table — captures the 4-layer table +
    points at docs/guides/coverage.md + ADR-020 + says agents must run
    coverage:diff before reporting complete
  - AGENTS.md preamble: now lists coverage as a parallel multi-latency
    quality system alongside conformance, with the same gate / latency
    framing
  - PRD frontmatter: status draft -> shipped + shipped date +
    shipping-commits list (all 10 SHAs anchoring the trace)
  - PRD findings table: each row gets a Resolution column citing the
    commit that closed it; conclusion text updated to past tense
  - ADR-020 implementation phasing: rewritten as a status table with
    each step linked to the commit that shipped it + Boot-time
    assertFeatureConformance explicitly marked Deferred with rationale
  - docs/guides/coverage.md: removed "Boot wiring lands in the next
    story" line; replaced with the deferral rationale + clarified
    that two readers (vitest, coverage:diff) consume the manifest

Sandcastle prompts (item 2 from the user's choice):
  - .sandcastle/implementer.prompt.md: new "Coverage gates" section
    after the conformance-gates list, requiring `pnpm test --coverage`,
    `pnpm coverage:aggregate`, and `pnpm coverage:diff` to all pass
    before reporting `complete`. Machine-readable JSON shape of
    coverage:diff documented (status / uncovered[] / kind enum), with
    explicit instructions on how to interpret each kind. Allowlist
    expansion requires justification + test.
  - .sandcastle/reviewer.prompt.md: AC coverage relabeled to "AC
    coverage (acceptance criteria, not test coverage)" to disambiguate;
    new check #7 "Coverage gates (ADR-020)" requiring CI's
    Coverage — diff (L1) step green + per-layer thresholds met +
    no silent allowlist expansion + manifest band drift detection.

Effect: future agent runs through sandcastle now treat coverage as a
first-class blocking gate, parallel to conformance. PRs no longer
discover coverage failures only via CI; the implementer is required
to check before reporting done, and the reviewer is required to
verify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-13 16:47:16 +02:00

12 KiB

Raw Blame History

Coverage

Architecture: ADR-020. Glossary: docs/glossary.md → Coverage.

The agent-first coverage architecture has four layers. This guide is the day-to-day reference for working with them.

The four layers at a glance

Layer	Question it answers	Command
L0 Per-layer vitest thresholds	"Did the last test run meet the declared bands?"	`pnpm test -- --coverage`
L1 Diff coverage	"Did this PR/slice cover its own changed lines?"	`pnpm coverage:diff`
L2 Aggregate trend	"How is coverage trending across the repo?"	`pnpm coverage:aggregate` → `coverage/summary.json`
L3 Mutation testing	"Do my tests actually assert anything?"	`pnpm mutate` (opt-in, not in default `pnpm test`)

Each layer answers a distinct question. They compose, none replaces the others.

Single source of truth: `feature.manifest.ts`

Every feature declares its coverage expectations once, in feature.manifest.ts:

export const myFeatureManifest = defineFeature({
  // ...
  coverage: {
    bands: {
      baseline: { statements: 80, branches: 75, functions: 80, lines: 80 },
      entities: { statements: 100, branches: 100, functions: 100, lines: 100 },
      "use-cases": {
        statements: 100,
        branches: 95,
        functions: 100,
        lines: 100,
      },
      controllers: {
        statements: 100,
        branches: 95,
        functions: 100,
        lines: 100,
      },
    },
    mutationTargets: ["entities", "use-cases"],
  },
} as const);

Two readers pick this up today:

vitest.config.ts — uses vitestThresholdsFromBands(DEFAULT_COVERAGE_BANDS) from @repo/core-shared/conformance/coverage. Most features import DEFAULT_COVERAGE_BANDS directly (the manifest's coverage section matches the defaults). For features with custom bands, override at the vitest config too.
pnpm coverage:diff — uses the bands for per-path expectations against the merged lcov.

(A third reader, a boot-time assertFeatureConformance coverage check, was specified in the PRD and explicitly deferred per ADR-020 — when both readers above derive from the same manifest, the drift it was supposed to catch is mechanically impossible. The manifest's coverage: field remains the declarative source of truth regardless of how many readers consume it.)

Edit the manifest. The other readers pick up the change.

Daily workflow

Before pushing

pnpm test -- --coverage     # L0 — per-package thresholds enforced
pnpm coverage:aggregate     # L2 — produce coverage/lcov.info + summary.json
pnpm coverage:diff          # L1 — fails if changed lines aren't covered

The diff coverage step compares against origin/main by default. To compare against a different base:

pnpm coverage:diff -- --base HEAD~1
pnpm coverage:diff -- --base origin/release

For machine consumption (e.g., the agent dispatch loop):

pnpm coverage:diff -- --json | jq .uncovered

Reading a failure

pnpm coverage:diff exits with code 1 and emits both stdout (JSON) and stderr (summary):

stderr (human):

[coverage:diff] FAIL — 3 uncovered hit(s) across 2 file(s):
  packages/blog/src/application/use-cases/publish-article.use-case.ts
    uncovered lines: 47, 48
  packages/auth/src/entities/models/session.ts
    uncovered lines: 22

stdout (JSON, also written for the dispatch loop):

{
  "status": "fail",
  "summary": {
    "filesChanged": 4,
    "filesGated": 2,
    "uncoveredCount": 3
  },
  "fileSummaries": [...],
  "uncovered": [
    { "file": "...", "line": 47, "kind": "uncovered" },
    { "file": "...", "line": 48, "kind": "uncovered" },
    { "file": "...", "line": 22, "kind": "uncovered" }
  ]
}

kind is one of:

uncovered — line is executable per lcov, execution count is 0
no-coverage-data — entire file isn't in lcov (likely a new untested file)

Fixing an uncovered slice

Read the JSON. For each uncovered hit, navigate to <file>:<line>.
Identify which test would have exercised that line. Usually it's missing a branch case or an error path.
Add the test (TDD: write failing test → make it green).
Re-run pnpm test --coverage --filter @repo/<feature> to verify.
Re-run pnpm coverage:diff to confirm exit 0.

For no-coverage-data hits, write the sibling test file — vitest's ESLint conformance rule usecase-must-have-test-file will start failing anyway if you don't.

What's exempt (the allowlist)

The diff coverage gate skips:

Test files (*.test.ts, *.test.tsx, *.test.mjs)
Fixtures, factories, contracts, seeds (__fixtures__/, __factories__/, __contracts__/, __seeds__/)
Config files (*.config.{ts,js,mjs,cjs}, package.json, tsconfig.*.json, turbo.json)
Docs and data (*.md, *.json, *.yaml, .gitignore, .npmrc)
Shell scripts (*.sh, *.bash)
Dev tooling under scripts/ and turbo/generators/
Per-feature excludes mirrored from vitest (di/bind-production.ts, application/repositories/**, application/services/**, integrations/cms/**, ui/**, *.interface.ts, index.ts barrels)
Build artifacts (dist/, .next/, .turbo/, node_modules/, coverage/)

The allowlist lives in scripts/coverage/diff.mjs and is unit-tested.

Adjusting bands

To raise the bar on a feature

Edit packages/<feature>/src/feature.manifest.ts:

coverage: {
  bands: {
    baseline: { statements: 90, branches: 85, functions: 90, lines: 90 },  // tighter
    entities: { statements: 100, branches: 100, functions: 100, lines: 100 },
    "use-cases": { statements: 100, branches: 100, functions: 100, lines: 100 },  // bumped branches
    controllers: { statements: 100, branches: 95, functions: 100, lines: 100 },
  },
}

If the new bands are stricter than the defaults, also update packages/<feature>/vitest.config.ts to use vitestThresholdsFromManifest(myFeatureManifest) instead of DEFAULT_COVERAGE_BANDS. (Note: importing the manifest from a vitest config has tooling constraints — see the DEFAULT_COVERAGE_BANDS route as the default path.)

To skip a layer

Omit it from bands. The layer falls through to baseline:

coverage: {
  bands: {
    baseline: { ... },
    entities: { ... },
    // controllers omitted -> matches baseline
  },
}

CI behavior

.github/workflows/ci.yml (validate job) runs three coverage steps after the test step:

Test with coverage — produces per-package coverage/lcov.info
Coverage — aggregate (L2) — merges to root coverage/lcov.info + coverage/summary.json
Coverage — diff (L1) — only on pull requests, diffs against origin/<base-ref>

On merge to main, .github/workflows/coverage-snapshot.yml re-aggregates and commits the updated coverage/summary.json back to main. Trend history accumulates via git log -- coverage/summary.json.

Reading the trend

git log --oneline --follow -- coverage/summary.json | head -10
git show <sha> -- coverage/summary.json | grep -E '"statements"|"branches"'

coverage/summary.json is the only committed coverage artifact. Each snapshot includes:

generatedAt — ISO timestamp
commit — short SHA
repo — repo-wide percentages + raw counts
byPackage — per-package percentages, keyed by @repo/<name>

Mutation testing (L3)

Stryker mutation testing on entities/ + application/use-cases/ — the pure-business-logic surface. Not part of pnpm test (slow); runs on-demand and nightly via GH Action.

Running

pnpm mutate                        # every feature with a stryker.config.json
pnpm mutate -- --filter @repo/auth # one feature
pnpm mutate -- --since main        # incremental against base ref
pnpm mutate -- --json              # machine-readable summary

Configuration

Each feature has a slim stryker.config.json that extends the shared base:

{
  "$schema": "../../node_modules/@stryker-mutator/core/schema/stryker-schema.json",
  "extends": "@repo/core-testing/stryker.base.json"
}

The base lives at packages/core-testing/stryker.base.json and defines:

Test runner: vitest (uses each feature's vitest.config.ts)
Scope: src/entities/**/*.ts and src/application/use-cases/**/*.ts (excludes tests/factories/contracts)
Thresholds: high 90 / low 80 / break 80 (break is the fail threshold)
Reporters: progress, html (reports/mutation/index.html), json (reports/mutation/mutation.json)
Incremental mode: enabled (subsequent runs skip mutants whose source + tests haven't changed)
Concurrency: 4 workers

To override per feature (rare), add fields to the feature's stryker.config.json:

{
  "extends": "@repo/core-testing/stryker.base.json",
  "thresholds": { "high": 95, "low": 85, "break": 85 },
  "mutate": ["src/entities/**/*.ts"]
}

CI: nightly run + on-demand

.github/workflows/mutation-nightly.yml runs Stryker across every feature at 02:30 UTC + on workflow_dispatch. The dispatch UI accepts a filter input (e.g. @repo/auth) for targeted reruns. Reports uploaded as the mutation-reports artifact (30-day retention). On meaningful score drops it opens a tracking issue labelled mutation-testing.

What you're looking for

Stryker's mutation.json reports the mutation score (killed mutants / total) per file. A surviving mutant means: the mutator changed source code (e.g., < → <=, && → ||, removed a line, etc.), reran the tests, and they STILL passed. That's a test that exists + executes the code but doesn't actually assert behavior.

Fix: read the surviving mutant's diff in reports/mutation/index.html, identify the assertion that should have caught it, add the assertion.

Troubleshooting

"Cannot find module '@vitest/coverage-v8'" — your feature's package.json is missing @vitest/coverage-v8 as a dev dep. Add it. (This was the issue surfaced for media during the L0 audit.)

"Coverage for lines (X%) does not meet 'src/...' threshold (Y%)" — L0 failure. Real test gap. Either write the missing test or adjust the manifest band downward (rare; band relaxation should be justified).

pnpm coverage:diff says "lcov file not found" — run pnpm test -- --coverage && pnpm coverage:aggregate first. The diff script reads the merged root coverage/lcov.info.

coverage/summary.json differs every commit — expected. It includes generatedAt (ISO timestamp) and commit (SHA). The snapshot workflow only commits it when the underlying numbers change; in local dev, regenerating it shows diff noise.

Diff coverage flags a file I don't think should be gated — check the allowlist in scripts/coverage/diff.mjs. If the file genuinely shouldn't be gated, extend the allowlist (and the tests in diff.test.mjs).

ADR-020 — full architectural rationale
ADR-011 — original TDD foundation (the thresholds originated here)
PRD 2026-05-13-coverage-architecture — implementation seed with audit findings
docs/glossary.md — canonical vocabulary
docs/guides/conformance-quickref.md — sibling reference for the 5-gate conformance system

12 KiB Raw Blame History