agentic-dev/2026-05-13-coverage-architecture.prd.md at fc27eef6eb7f80c500e562f7b5e6fb5b85b5683c

Template

Files

Danijel Martinek fc27eef6eb docs(coverage): sync docs to shipped state + wire sandcastle prompts

Closes the staleness gap after the 10-commit coverage epic shipped.

Doc sync (item 1 from the user's choice):
  - CLAUDE.md Quick Start: adds pnpm coverage:aggregate / coverage:diff
    / mutate to the command listing
  - CLAUDE.md: new "Sibling architecture: coverage (ADR-020)" section
    after the conformance gate table — captures the 4-layer table +
    points at docs/guides/coverage.md + ADR-020 + says agents must run
    coverage:diff before reporting complete
  - AGENTS.md preamble: now lists coverage as a parallel multi-latency
    quality system alongside conformance, with the same gate / latency
    framing
  - PRD frontmatter: status draft -> shipped + shipped date +
    shipping-commits list (all 10 SHAs anchoring the trace)
  - PRD findings table: each row gets a Resolution column citing the
    commit that closed it; conclusion text updated to past tense
  - ADR-020 implementation phasing: rewritten as a status table with
    each step linked to the commit that shipped it + Boot-time
    assertFeatureConformance explicitly marked Deferred with rationale
  - docs/guides/coverage.md: removed "Boot wiring lands in the next
    story" line; replaced with the deferral rationale + clarified
    that two readers (vitest, coverage:diff) consume the manifest

Sandcastle prompts (item 2 from the user's choice):
  - .sandcastle/implementer.prompt.md: new "Coverage gates" section
    after the conformance-gates list, requiring `pnpm test --coverage`,
    `pnpm coverage:aggregate`, and `pnpm coverage:diff` to all pass
    before reporting `complete`. Machine-readable JSON shape of
    coverage:diff documented (status / uncovered[] / kind enum), with
    explicit instructions on how to interpret each kind. Allowlist
    expansion requires justification + test.
  - .sandcastle/reviewer.prompt.md: AC coverage relabeled to "AC
    coverage (acceptance criteria, not test coverage)" to disambiguate;
    new check #7 "Coverage gates (ADR-020)" requiring CI's
    Coverage — diff (L1) step green + per-layer thresholds met +
    no silent allowlist expansion + manifest band drift detection.

Effect: future agent runs through sandcastle now treat coverage as a
first-class blocking gate, parallel to conformance. PRs no longer
discover coverage failures only via CI; the implementer is required
to check before reporting done, and the reviewer is required to
verify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-13 16:47:16 +02:00

20 KiB

Raw Blame History

id, title, type, status, author, elicitation-session, created, shipped, shipping-commits

id title type status author elicitation-session created shipped shipping-commits

2026-05-13-coverage-architecture

Agent-first coverage architecture (4 layers + manifest-driven thresholds)

prd

shipped

danijel

brainstorm-2026-05-13

2026-05-13

7eb783a (PRD)

4dce1df (ADR-020 + glossary + hook)

f7baa8b (manifest schema + helper + auth)

412d994 (L1 coverage:diff)

bd5a077 (L2 coverage:aggregate)

39e33eb (CI integration)

15db9c4 (helper rollout blog + marketing-pages)

f4254aa (cookbook guide + generator)

6428f10 (L3 Stryker mutation)

bf0b049 (L0 unification — all 5 features green)

Problem

The template enforces "every use case has a test file" via the usecase-must-have-test-file ESLint rule (structural) and declares per-layer coverage thresholds in each feature's vitest.config.ts (100% on entities/use-cases/controllers; 80/75/80/80 baseline). But:

Agents can ship slices that don't actually exercise the new code. The ESLint rule only checks file presence; coverage % checks what executed. There's no gate that says "this PR's diff was tested."
The declared 100%-on-critical-layers thresholds may be aspirational, not enforced. Their actual green/red state is unverified. CI uploads **/coverage/lcov.info as an artifact but doesn't gate on it.
There's no aggregate visibility. No merged report, no trend, no "is the codebase covered well right now?" answer.
100% coverage with weak assertions is invisible. Tests that import the SUT but barely assert anything pass coverage. The third dimension of test quality — "would my test catch a real regression?" — isn't measured.
Coverage expectations live in 5 separate vitest configs. Drift across features is easy and hard to spot.

For an agent-first template where most code is authored by AI agents in vertical slices, this is the single biggest gap between "feature shipped" and "feature shipped safely."

Goal

Establish a 4-layer coverage architecture that mirrors the existing 5-gate conformance philosophy (multi-latency, machine-readable, agent-first) and makes coverage a first-class conformance signal driven from each feature's feature.manifest.ts.

In scope

A coverage: section in feature.manifest.ts as the single source of truth for per-layer expectations
Vitest config auto-derives test-time thresholds from the manifest
pnpm coverage:diff script — cover-the-diff gate (changed lines must be exercised), machine-readable output for the dispatch loop
pnpm coverage:aggregate script — merges per-package lcov into a root coverage/lcov.info + grep-able coverage/summary.json
coverage/summary.json committed per merge; trend readable from git log
pnpm mutate — Stryker on entities/ + application/use-cases/ only, on-demand (not part of pnpm test)
assertFeatureConformance reads the manifest's coverage band and the package's lcov at boot; fails if drift
CI gate: pnpm test --coverage (existing) + pnpm coverage:diff (new) + pnpm coverage:aggregate (new)
ADR-020 capturing the architecture
docs/guides/coverage.md cookbook
Glossary entries for new vocabulary
Generator update — pnpm turbo gen feature scaffolds the coverage: manifest section
.claude/hooks/prompt-context.sh detects coverage-related prompts and injects pointers

Out of scope

Codecov / SaaS dashboards. Aggregate trend ships as committed coverage/summary.json only. SaaS can be a later ADR.
Coverage badges or PR comments. Optional follow-up if humans want them; agents don't.
Mutation testing on infrastructure / repositories / controllers — entities + use-cases only for v1. Wider scope is a future epic.
Branch coverage on __seeds__/ / __factories__/ / __contracts__/ — already excluded from coverage; stays out.
Coverage for apps/ — they have their own existing thresholds; not part of this initiative.
Test quality beyond coverage + mutation — property-based tests, fuzzing, etc. are separate concerns.

Constraints

ADR-014, ADR-017 (instrumentation): coverage instrumentation must not interfere with span/log collection.
ADR-011 (TDD foundation): the declared per-layer thresholds (entities/use-cases/controllers at 100%; baseline 80/75/80/80) are existing decisions; we honor them and centralize their declaration.
The conformance ESLint rules are AST-time + filesystem; coverage assertions are runtime data — they must remain in separate gates.
Generator-first is non-negotiable: any new file added to a feature must come from pnpm turbo gen feature or a sibling generator.
pnpm conformance and pnpm test already take ~120s and ~90s respectively; new gates must not add more than ~30s wall time to the default loop.
Agent dispatch (pnpm work dispatch) must be able to read coverage results as JSON without a network call.

Success criteria

pnpm test --coverage passes green across all five feature packages (verifies L0 baseline is real, not aspirational).
pnpm coverage:diff exits non-zero when a changed line is uncovered; outputs JSON to stdout listing each uncovered hunk with file + line range.
pnpm coverage:aggregate produces coverage/lcov.info + coverage/summary.json at the repo root; both readable in <1s.
coverage/summary.json is committed and git log -- coverage/summary.json shows trend over time.
pnpm mutate --filter <feature> runs Stryker on entities + use-cases; produces a per-feature mutation score.
feature.manifest.ts includes coverage: { ... }; removing or weakening a band fails pnpm dev boot via assertFeatureConformance.
CI fails when (a) a per-layer threshold breach happens, (b) any changed line is uncovered, (c) the aggregate report can't be produced.
ADR-020 + docs/guides/coverage.md + glossary entries land alongside implementation.
pnpm turbo gen feature emits a manifest with the coverage: section pre-populated to defaults.

User stories

As an AI implementer agent, I want pnpm coverage:diff to exit with a precise JSON list of uncovered hunks, so that I can immediately add the missing test without searching.
As an AI implementer agent, I want pnpm dev to refuse to boot if I've authored a manifest entry without backing tests, so that I cannot ship an untested slice.
As an AI reviewer agent, I want the dispatch loop to surface coverage drift as part of its post-task verification, so that I can flag the slice for revision.
As a human reviewer, I want coverage/summary.json committed on merge, so that I can see coverage trend via git log without leaving the repo.
As a template maintainer, I want each feature's feature.manifest.ts to declare its coverage band, so that I have one place to read or change expectations.
As a template maintainer, I want pnpm mutate to surface tests that don't actually assert, so that 100% line coverage can't paper over weak tests.
As a template adopter, I want pnpm turbo gen feature to scaffold the coverage: manifest section with sensible defaults, so that I don't have to remember the shape.
As a future agent in a session, I want the prompt-context hook to inject coverage pointers when I mention "coverage" or "uncovered" in a prompt, so that the relevant ADR + guide load automatically.
As an on-call engineer, I want coverage/summary.json to include a timestamp + commit SHA, so that I can correlate coverage state with deploys.
As an AI agent receiving a handoff, I want the coverage state of the in-flight slice to be readable from a known path, so that I can continue without re-running the suite.

Implementation decisions

Architecture: 4 layers, mirroring the 5-gate conformance philosophy

Layer	What it catches	Latency	Runs in
L0 Per-layer vitest thresholds	Drift below declared bands (e.g., entities < 100%)	~5–30s per package	`pnpm test --coverage` (existing)
L1 Diff coverage	"Changed line was not exercised"	~5s after L0	`pnpm coverage:diff`; CI gate; dispatch post-task
L2 Aggregate trend	"Codebase coverage drifted over time"	~10s	`pnpm coverage:aggregate`; committed `coverage/summary.json`
L3 Mutation testing	"Test exists, executes the code, but asserts nothing"	Minutes	`pnpm mutate`; on-demand; nightly GH Action

Manifest-driven coverage band (the keystone)

A coverage: section in feature.manifest.ts is the single source of truth. Default scaffolded shape:

entities: 100/100/100/100
use-cases: 100/95/100/100
controllers: 100/95/100/100
baseline: 80/75/80/80
mutationTargets: ["entities", "use-cases"]

Three readers:

Vitest — vitest.config.ts imports the manifest and emits its coverage.thresholds. Eliminates the duplication today.
assertFeatureConformance — reads coverage/lcov.info for the package at boot, asserts each band. Fails to boot on drift (matching the existing brand-assertion shape).
pnpm coverage:diff — uses baseline as the default expectation for non-layer-tagged files; stricter bands override per-path.

Diff coverage algorithm

pnpm coverage:diff [<base-ref>] (default origin/main...HEAD):

Run git diff --unified=0 --no-color <base>...HEAD → list of (file, hunk-start, hunk-end) for changed/added lines.
Read merged coverage/lcov.info → executable-line + execution-count per file.
For each changed line that is executable (skip comments, types, exports, declarations): assert execution count > 0.
Output stdout JSON { status: "pass" | "fail", uncovered: [{ file, line, kind }] }.
Output stderr human-readable summary.
Exit 0 on pass, 1 on fail.

Filename allowlist (don't gate on diff coverage): *.test.ts, *.test.tsx, *.config.*, *.md, *.json, *.mjs, *.cjs. Same exclude list as L0's existing per-package coverage excludes (DI bootstrap, interfaces, CMS, UI, factories, contracts).

Aggregate report

pnpm coverage:aggregate:

Collect every packages/**/coverage/lcov.info + apps/**/coverage/lcov.info.
Merge into coverage/lcov.info via lcov-result-merger or monocart-coverage-reports.

Emit coverage/summary.json:

{
  "generatedAt": "2026-05-13T12:34:56Z",
  "commit": "abc1234",
  "repo": { "statements": 87.4, "branches": 81.2, "functions": 89.0, "lines": 87.4 },
  "byPackage": {
    "@repo/auth": { "statements": 96.1, ... },
    ...
  }
}

Re-render HTML at coverage/html/index.html for human drill-down (gitignored).

coverage/summary.json is committed; coverage/lcov.info + coverage/html/ are gitignored.

Mutation testing

pnpm mutate [--filter @repo/<feature>]:

Uses Stryker with @stryker-mutator/vitest-runner.
Per-feature stryker.config.json scaffolded by the generator.
Mutation scope honors the manifest's mutationTargets (entities + use-cases by default).
Output: reports/mutation/<feature>/ (HTML + JSON; gitignored).
Mutation score threshold: 80% per feature (tunable in manifest); not enforced in default pnpm test.
Optional nightly GH Action: mutation-nightly.yml — runs across all features, opens an issue on score drop > 5%.

Boot-time assertion

assertFeatureConformance (in core-shared/conformance/) gains a new check:

Look for coverage/lcov.info at the feature package root.
If absent: skip with a logger.debug(...) (dev mode is allowed to lack coverage data).
If present: parse, compute per-layer % for entities/, application/use-cases/, interface-adapters/controllers/.
Compare against the manifest's coverage: bands. Fail boot on breach with a CoverageDriftError.

Graceful degradation in dev (USE_DEV_SEED=true): assertion logs warning instead of throwing, so pnpm dev boots without a fresh coverage run.

CI integration

.github/workflows/ci.yml gains three steps after the existing test step:

pnpm coverage:aggregate (always)
pnpm coverage:diff (always; fails build on uncovered diff)
Upload coverage/lcov.info + coverage/summary.json as artifact (existing flow)
On merge to main only: commit coverage/summary.json back to the repo (separate workflow with permissions: contents: write)

Generator integration

pnpm turbo gen feature template emits:

Manifest with coverage: { ... defaults ... } section at a <gen:coverage> anchor
vitest.config.ts that imports the manifest and derives coverage.thresholds from it
stryker.config.json at the package root, scoped to entities/ + application/use-cases/

The CI guard at packages/core-eslint/anchors.test.js adds the new anchor.

Hook integration

.claude/hooks/prompt-context.sh gains a new keyword group:

if echo "$prompt" | grep -qE 'coverage|uncovered|mutation|stryker|lcov'; then
  inject+=('Coverage: ADR-020 + docs/guides/coverage.md. 4 layers: L0 vitest thresholds, L1 pnpm coverage:diff, L2 coverage/summary.json, L3 pnpm mutate. Manifest-driven via feature.manifest.ts coverage section.')
fi

Testing decisions

A good test for this initiative covers behavior through the public surface:

The diff-coverage script gets tested by feeding it a synthetic git diff + lcov.info and asserting JSON output shape and exit code. Fixtures, not e2e.
The aggregate script gets unit-tested by feeding it N synthetic per-package lcov files and asserting the merged output structure.
assertFeatureConformance gets a new test in core-shared/conformance/ that constructs a manifest + a coverage/lcov.info fixture and asserts pass/fail behavior. No real test run.
Stryker integration gets an integration-test style verification: run pnpm mutate --filter @repo/auth in CI nightly and assert mutation score > 80% on a known-good commit.
The manifest schema gets a Zod schema test: invalid coverage: shapes fail parse with a specific error message.

Modules to add test coverage to:

scripts/coverage/diff.mjs (the diff coverage runner)
scripts/coverage/aggregate.mjs (the merger)
packages/core-shared/src/conformance/assert-coverage.ts (the boot-time reader + comparator)
packages/core-shared/src/conformance/manifest-schema.ts (extended Zod schema)

Prior art to mirror:

scripts/work/state-builder.mjs — for the diff-coverage script's shape (Node ESM, no deps, fixture-based test)
packages/core-eslint/anchors.test.js — for the new anchor's CI guard pattern
Existing assertFeatureConformance brand-assertion code — for the boot-time check shape

Open questions

Q1: Mutation score threshold per feature — start at 80% across the board, or tune per-feature in the manifest? Recommended: start at 80% baseline, allow per-feature override via coverage.mutationThreshold.
Q2: When coverage/lcov.info is stale at boot — fail loudly or warn? Recommended: warn in dev, fail in NODE_ENV=production boot.
Q3: Should pnpm coverage:diff run automatically as a Git pre-push hook? Recommended: no — pre-commit already runs; pre-push adds latency without proportional value. Keep it CI-only + dispatch-loop.
Q4: How is coverage/summary.json committed safely on merge to main? Recommended: separate workflow with a github-actions[bot] identity, permissions: contents: write, skipped if no diff in summary.
Q5: monocart-coverage-reports vs. lcov-result-merger? Recommended: monocart — actively maintained, single-binary, handles V8 + Istanbul, no Python dep.

Out of scope (deferred to future PRDs)

Codecov / SaaS dashboards. Aggregate is committed; SaaS is gold-plating.
Mutation testing on infrastructure/, interface-adapters/, integrations/. Bigger surface, slower runs; revisit after L3 v1 is stable.
Coverage-aware Storybook screenshot diffing. Visual regression is its own concern.
Code review heuristics from coverage (e.g., "this PR touched a 100%-covered file and dropped to 95% — needs review tag"). Possible follow-up via fallow audit.

L0 verification findings (2026-05-13, during brainstorm)

Ran pnpm --filter @repo/<x> test -- --coverage --run per feature. State of the declared 100%-on-entities/use-cases/controllers:

Feature	State	Detail
`@repo/auth`	✅ green	21 tests, 93.7% overall, all per-layer bands hit 100%
`@repo/blog`	✅ green	passed (no threshold errors surfaced)
`@repo/marketing-pages`	✅ green	passed
`@repo/navigation`	❌ real gap	`entities/`: 86.36% lines / 50% functions; `controllers/`: 86.66% lines / 80% branches
`@repo/media`	❌ config + real gap	(1) Missing `@vitest/coverage-v8` dev dep — `--coverage` crashed. (2) `vitest.config.ts` had NO coverage block at all (no per-layer thresholds, no excludes). When the standard block was applied, real gaps surfaced in `controllers/` (one controller at 86.66% lines / 75% branches, lines 19-20 uncovered).

Conclusions reinforcing the PRD:

The L0 layer is real (auth/blog/marketing-pages prove the 100%/100%/95%/100% bar is achievable).
The duplication problem is real (5 features, 4 different vitest configs, one entirely absent the coverage block — exactly the drift the manifest-driven keystone eliminates).
Real test gaps exist in 2 of 5 features (navigation, media). Fixing them is part of the implementation epic, not a blocker for this design landing.

Patches landed alongside the PRD (non-blocking for CI):

packages/media/package.json — added @vitest/coverage-v8 dev dep so pnpm test -- --coverage no longer crashes.
packages/media/vitest.config.ts — left intentionally minimal (no coverage block); commented to point at the L0 unification story.

The full per-layer block for media + the missing tests in navigation + media land in the L0 unification story of the implementation epic, as the first work item after the manifest schema lands.

Further notes

Builds on: ADR-011 (TDD foundation), ADR-006 (vertical-feature-packages), the existing 5-gate conformance system.
ADR-020 is the durable record of the architecture decisions; this PRD is the implementation seed.
Each layer should land as a separate story (L0 unification, manifest schema + auto-derive, L1 diff, L2 aggregate, L3 mutation, ADR + docs). Estimated effort: one mid-sized epic, 6–8 stories.
This is the canonical example of "agent-first observability": every layer optimized for machine consumption first, human consumption second.

20 KiB Raw Blame History Unescape Escape