Convention shift: epic folders + PRD filenames + frontmatter id
fields are now bare slugs. The created: timestamp (Phase 2) carries
the date; folder names don't repeat it. A future <task-id>-<slug>
shape (e.g. ClickUp) lands cleanly when that integration ships.
Renames (git mv preserves history):
- docs/work/2026-05-13-binder-wrap-helper/
-> docs/work/binder-wrap-helper/
- docs/work/2026-05-14-library-evaluation-policy/
-> docs/work/library-evaluation-policy/
- docs/work/2026-05-14-ci-security-and-supply-chain/
-> docs/work/ci-security-and-supply-chain/
- docs/work/prds/2026-05-13-binder-wrap-helper.prd.md
-> docs/work/prds/binder-wrap-helper.prd.md
- docs/work/prds/2026-05-13-coverage-architecture.prd.md
-> docs/work/prds/coverage-architecture.prd.md
- docs/work/prds/2026-05-14-library-evaluation-policy.prd.md
-> docs/work/prds/library-evaluation-policy.prd.md
- docs/work/prds/2026-05-14-ci-security-and-supply-chain.prd.md
-> docs/work/prds/ci-security-and-supply-chain.prd.md
Frontmatter updates inside the renamed files: epic id, epic prd,
story epic, PRD id, PRD builds-on all drop date prefixes.
System folder + state file move:
- New docs/work/_system/ holds framework-managed state.
- docs/work/_state.json -> docs/work/_system/_state.json.
- state-builder.mjs adds _system to SKIP_FOLDERS.
- cli.mjs + state-sync-guard.mjs + .husky/pre-commit point at the
new path.
template-reset-v1 epic deleted entirely (one-off cleanup epic from
the pre-date-convention era; status was already done).
Generator-template updates (so new artifacts ship in the right
shape):
- .sandcastle/decomposer.prompt.md emits bare-slug folder names +
ISO created: timestamp.
- .claude/skills/to-prd/SKILL.md template uses bare-slug filename +
bare-slug id field + ISO created: timestamp.
Doc reference updates: glossary, runbook, agent-first-workflow-
and-conformance, reviewer prompt, ADR-020, ADR-022, ADR-023 all
point at the new paths/slugs.
20 KiB
id, title, type, status, author, elicitation-session, created, updated, shipped, shipping-commits
| id | title | type | status | author | elicitation-session | created | updated | shipped | shipping-commits | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| coverage-architecture | Agent-first coverage architecture (4 layers + manifest-driven thresholds) | prd | shipped | danijel | brainstorm-2026-05-13 | 2026-05-13T00:00:00Z | 2026-05-14T19:16:52.691Z | 2026-05-13 |
|
Problem
The template enforces "every use case has a test file" via the usecase-must-have-test-file ESLint rule (structural) and declares per-layer coverage thresholds in each feature's vitest.config.ts (100% on entities/use-cases/controllers; 80/75/80/80 baseline). But:
- Agents can ship slices that don't actually exercise the new code. The ESLint rule only checks file presence; coverage % checks what executed. There's no gate that says "this PR's diff was tested."
- The declared 100%-on-critical-layers thresholds may be aspirational, not enforced. Their actual green/red state is unverified. CI uploads
**/coverage/lcov.infoas an artifact but doesn't gate on it. - There's no aggregate visibility. No merged report, no trend, no "is the codebase covered well right now?" answer.
- 100% coverage with weak assertions is invisible. Tests that import the SUT but barely assert anything pass coverage. The third dimension of test quality — "would my test catch a real regression?" — isn't measured.
- Coverage expectations live in 5 separate vitest configs. Drift across features is easy and hard to spot.
For an agent-first template where most code is authored by AI agents in vertical slices, this is the single biggest gap between "feature shipped" and "feature shipped safely."
Goal
Establish a 4-layer coverage architecture that mirrors the existing 5-gate conformance philosophy (multi-latency, machine-readable, agent-first) and makes coverage a first-class conformance signal driven from each feature's feature.manifest.ts.
In scope
- A
coverage:section infeature.manifest.tsas the single source of truth for per-layer expectations - Vitest config auto-derives test-time thresholds from the manifest
pnpm coverage:diffscript — cover-the-diff gate (changed lines must be exercised), machine-readable output for the dispatch looppnpm coverage:aggregatescript — merges per-package lcov into a rootcoverage/lcov.info+ grep-ablecoverage/summary.jsoncoverage/summary.jsoncommitted per merge; trend readable fromgit logpnpm mutate— Stryker onentities/+application/use-cases/only, on-demand (not part ofpnpm test)assertFeatureConformancereads the manifest's coverage band and the package's lcov at boot; fails if drift- CI gate:
pnpm test --coverage(existing) +pnpm coverage:diff(new) +pnpm coverage:aggregate(new) - ADR-020 capturing the architecture
docs/guides/coverage.mdcookbook- Glossary entries for new vocabulary
- Generator update —
pnpm turbo gen featurescaffolds thecoverage:manifest section .claude/hooks/prompt-context.shdetects coverage-related prompts and injects pointers
Out of scope
- Codecov / SaaS dashboards. Aggregate trend ships as committed
coverage/summary.jsononly. SaaS can be a later ADR. - Coverage badges or PR comments. Optional follow-up if humans want them; agents don't.
- Mutation testing on infrastructure / repositories / controllers — entities + use-cases only for v1. Wider scope is a future epic.
- Branch coverage on
__seeds__//__factories__//__contracts__/— already excluded from coverage; stays out. - Coverage for
apps/— they have their own existing thresholds; not part of this initiative. - Test quality beyond coverage + mutation — property-based tests, fuzzing, etc. are separate concerns.
Constraints
- ADR-014, ADR-017 (instrumentation): coverage instrumentation must not interfere with span/log collection.
- ADR-011 (TDD foundation): the declared per-layer thresholds (entities/use-cases/controllers at 100%; baseline 80/75/80/80) are existing decisions; we honor them and centralize their declaration.
- The conformance ESLint rules are AST-time + filesystem; coverage assertions are runtime data — they must remain in separate gates.
- Generator-first is non-negotiable: any new file added to a feature must come from
pnpm turbo gen featureor a sibling generator. pnpm conformanceandpnpm testalready take ~120s and ~90s respectively; new gates must not add more than ~30s wall time to the default loop.- Agent dispatch (
pnpm work dispatch) must be able to read coverage results as JSON without a network call.
Success criteria
pnpm test --coveragepasses green across all five feature packages (verifies L0 baseline is real, not aspirational).pnpm coverage:diffexits non-zero when a changed line is uncovered; outputs JSON to stdout listing each uncovered hunk with file + line range.pnpm coverage:aggregateproducescoverage/lcov.info+coverage/summary.jsonat the repo root; both readable in <1s.coverage/summary.jsonis committed andgit log -- coverage/summary.jsonshows trend over time.pnpm mutate --filter <feature>runs Stryker on entities + use-cases; produces a per-feature mutation score.feature.manifest.tsincludescoverage: { ... }; removing or weakening a band failspnpm devboot viaassertFeatureConformance.- CI fails when (a) a per-layer threshold breach happens, (b) any changed line is uncovered, (c) the aggregate report can't be produced.
- ADR-020 +
docs/guides/coverage.md+ glossary entries land alongside implementation. pnpm turbo gen featureemits a manifest with thecoverage:section pre-populated to defaults.
User stories
- As an AI implementer agent, I want
pnpm coverage:diffto exit with a precise JSON list of uncovered hunks, so that I can immediately add the missing test without searching. - As an AI implementer agent, I want
pnpm devto refuse to boot if I've authored a manifest entry without backing tests, so that I cannot ship an untested slice. - As an AI reviewer agent, I want the dispatch loop to surface coverage drift as part of its post-task verification, so that I can flag the slice for revision.
- As a human reviewer, I want
coverage/summary.jsoncommitted on merge, so that I can see coverage trend viagit logwithout leaving the repo. - As a template maintainer, I want each feature's
feature.manifest.tsto declare its coverage band, so that I have one place to read or change expectations. - As a template maintainer, I want
pnpm mutateto surface tests that don't actually assert, so that 100% line coverage can't paper over weak tests. - As a template adopter, I want
pnpm turbo gen featureto scaffold thecoverage:manifest section with sensible defaults, so that I don't have to remember the shape. - As a future agent in a session, I want the prompt-context hook to inject coverage pointers when I mention "coverage" or "uncovered" in a prompt, so that the relevant ADR + guide load automatically.
- As an on-call engineer, I want
coverage/summary.jsonto include a timestamp + commit SHA, so that I can correlate coverage state with deploys. - As an AI agent receiving a handoff, I want the coverage state of the in-flight slice to be readable from a known path, so that I can continue without re-running the suite.
Implementation decisions
Architecture: 4 layers, mirroring the 5-gate conformance philosophy
| Layer | What it catches | Latency | Runs in |
|---|---|---|---|
| L0 Per-layer vitest thresholds | Drift below declared bands (e.g., entities < 100%) | ~5–30s per package | pnpm test --coverage (existing) |
| L1 Diff coverage | "Changed line was not exercised" | ~5s after L0 | pnpm coverage:diff; CI gate; dispatch post-task |
| L2 Aggregate trend | "Codebase coverage drifted over time" | ~10s | pnpm coverage:aggregate; committed coverage/summary.json |
| L3 Mutation testing | "Test exists, executes the code, but asserts nothing" | Minutes | pnpm mutate; on-demand; nightly GH Action |
Manifest-driven coverage band (the keystone)
A coverage: section in feature.manifest.ts is the single source of truth. Default scaffolded shape:
entities: 100/100/100/100use-cases: 100/95/100/100controllers: 100/95/100/100baseline: 80/75/80/80mutationTargets:["entities", "use-cases"]
Three readers:
- Vitest —
vitest.config.tsimports the manifest and emits itscoverage.thresholds. Eliminates the duplication today. assertFeatureConformance— readscoverage/lcov.infofor the package at boot, asserts each band. Fails to boot on drift (matching the existing brand-assertion shape).pnpm coverage:diff— usesbaselineas the default expectation for non-layer-tagged files; stricter bands override per-path.
Diff coverage algorithm
pnpm coverage:diff [<base-ref>] (default origin/main...HEAD):
- Run
git diff --unified=0 --no-color <base>...HEAD→ list of(file, hunk-start, hunk-end)for changed/added lines. - Read merged
coverage/lcov.info→ executable-line + execution-count per file. - For each changed line that is executable (skip comments, types, exports, declarations): assert execution count > 0.
- Output stdout JSON
{ status: "pass" | "fail", uncovered: [{ file, line, kind }] }. - Output stderr human-readable summary.
- Exit 0 on pass, 1 on fail.
Filename allowlist (don't gate on diff coverage): *.test.ts, *.test.tsx, *.config.*, *.md, *.json, *.mjs, *.cjs. Same exclude list as L0's existing per-package coverage excludes (DI bootstrap, interfaces, CMS, UI, factories, contracts).
Aggregate report
pnpm coverage:aggregate:
- Collect every
packages/**/coverage/lcov.info+apps/**/coverage/lcov.info. - Merge into
coverage/lcov.infovialcov-result-mergerormonocart-coverage-reports. - Emit
coverage/summary.json:{ "generatedAt": "2026-05-13T12:34:56Z", "commit": "abc1234", "repo": { "statements": 87.4, "branches": 81.2, "functions": 89.0, "lines": 87.4 }, "byPackage": { "@repo/auth": { "statements": 96.1, ... }, ... } } - Re-render HTML at
coverage/html/index.htmlfor human drill-down (gitignored).
coverage/summary.json is committed; coverage/lcov.info + coverage/html/ are gitignored.
Mutation testing
pnpm mutate [--filter @repo/<feature>]:
- Uses Stryker with
@stryker-mutator/vitest-runner. - Per-feature
stryker.config.jsonscaffolded by the generator. - Mutation scope honors the manifest's
mutationTargets(entities + use-cases by default). - Output:
reports/mutation/<feature>/(HTML + JSON; gitignored). - Mutation score threshold: 80% per feature (tunable in manifest); not enforced in default
pnpm test. - Optional nightly GH Action:
mutation-nightly.yml— runs across all features, opens an issue on score drop > 5%.
Boot-time assertion
assertFeatureConformance (in core-shared/conformance/) gains a new check:
- Look for
coverage/lcov.infoat the feature package root. - If absent: skip with a
logger.debug(...)(dev mode is allowed to lack coverage data). - If present: parse, compute per-layer % for
entities/,application/use-cases/,interface-adapters/controllers/. - Compare against the manifest's
coverage:bands. Fail boot on breach with aCoverageDriftError.
Graceful degradation in dev (USE_DEV_SEED=true): assertion logs warning instead of throwing, so pnpm dev boots without a fresh coverage run.
CI integration
.github/workflows/ci.yml gains three steps after the existing test step:
pnpm coverage:aggregate(always)pnpm coverage:diff(always; fails build on uncovered diff)- Upload
coverage/lcov.info+coverage/summary.jsonas artifact (existing flow) - On merge to main only: commit
coverage/summary.jsonback to the repo (separate workflow withpermissions: contents: write)
Generator integration
pnpm turbo gen feature template emits:
- Manifest with
coverage: { ... defaults ... }section at a<gen:coverage>anchor vitest.config.tsthat imports the manifest and derivescoverage.thresholdsfrom itstryker.config.jsonat the package root, scoped toentities/+application/use-cases/
The CI guard at packages/core-eslint/anchors.test.js adds the new anchor.
Hook integration
.claude/hooks/prompt-context.sh gains a new keyword group:
if echo "$prompt" | grep -qE 'coverage|uncovered|mutation|stryker|lcov'; then
inject+=('Coverage: ADR-020 + docs/guides/coverage.md. 4 layers: L0 vitest thresholds, L1 pnpm coverage:diff, L2 coverage/summary.json, L3 pnpm mutate. Manifest-driven via feature.manifest.ts coverage section.')
fi
Testing decisions
A good test for this initiative covers behavior through the public surface:
- The diff-coverage script gets tested by feeding it a synthetic
git diff+lcov.infoand asserting JSON output shape and exit code. Fixtures, not e2e. - The aggregate script gets unit-tested by feeding it N synthetic per-package lcov files and asserting the merged output structure.
assertFeatureConformancegets a new test incore-shared/conformance/that constructs a manifest + acoverage/lcov.infofixture and asserts pass/fail behavior. No real test run.- Stryker integration gets an integration-test style verification: run
pnpm mutate --filter @repo/authin CI nightly and assert mutation score > 80% on a known-good commit. - The manifest schema gets a Zod schema test: invalid
coverage:shapes fail parse with a specific error message.
Modules to add test coverage to:
scripts/coverage/diff.mjs(the diff coverage runner)scripts/coverage/aggregate.mjs(the merger)packages/core-shared/src/conformance/assert-coverage.ts(the boot-time reader + comparator)packages/core-shared/src/conformance/manifest-schema.ts(extended Zod schema)
Prior art to mirror:
scripts/work/state-builder.mjs— for the diff-coverage script's shape (Node ESM, no deps, fixture-based test)packages/core-eslint/anchors.test.js— for the new anchor's CI guard pattern- Existing
assertFeatureConformancebrand-assertion code — for the boot-time check shape
Open questions
- Q1: Mutation score threshold per feature — start at 80% across the board, or tune per-feature in the manifest? Recommended: start at 80% baseline, allow per-feature override via
coverage.mutationThreshold. - Q2: When
coverage/lcov.infois stale at boot — fail loudly or warn? Recommended: warn in dev, fail inNODE_ENV=productionboot. - Q3: Should
pnpm coverage:diffrun automatically as a Git pre-push hook? Recommended: no — pre-commit already runs; pre-push adds latency without proportional value. Keep it CI-only + dispatch-loop. - Q4: How is
coverage/summary.jsoncommitted safely on merge to main? Recommended: separate workflow with a github-actions[bot] identity,permissions: contents: write, skipped if no diff in summary. - Q5:
monocart-coverage-reportsvs.lcov-result-merger? Recommended: monocart — actively maintained, single-binary, handles V8 + Istanbul, no Python dep.
Out of scope (deferred to future PRDs)
- Codecov / SaaS dashboards. Aggregate is committed; SaaS is gold-plating.
- Mutation testing on
infrastructure/,interface-adapters/,integrations/. Bigger surface, slower runs; revisit after L3 v1 is stable. - Coverage-aware Storybook screenshot diffing. Visual regression is its own concern.
- Code review heuristics from coverage (e.g., "this PR touched a 100%-covered file and dropped to 95% — needs review tag"). Possible follow-up via fallow audit.
L0 verification findings (2026-05-13, during brainstorm)
Ran pnpm --filter @repo/<x> test -- --coverage --run per feature. State of the declared 100%-on-entities/use-cases/controllers:
| Feature | State | Detail |
|---|---|---|
@repo/auth |
✅ green | 21 tests, 93.7% overall, all per-layer bands hit 100% |
@repo/blog |
✅ green | passed (no threshold errors surfaced) |
@repo/marketing-pages |
✅ green | passed |
@repo/navigation |
❌ real gap | entities/: 86.36% lines / 50% functions; controllers/: 86.66% lines / 80% branches |
@repo/media |
❌ config + real gap | (1) Missing @vitest/coverage-v8 dev dep — --coverage crashed. (2) vitest.config.ts had NO coverage block at all (no per-layer thresholds, no excludes). When the standard block was applied, real gaps surfaced in controllers/ (one controller at 86.66% lines / 75% branches, lines 19-20 uncovered). |
Conclusions reinforcing the PRD:
- The L0 layer is real (auth/blog/marketing-pages prove the 100%/100%/95%/100% bar is achievable).
- The duplication problem is real (5 features, 4 different vitest configs, one entirely absent the coverage block — exactly the drift the manifest-driven keystone eliminates).
- Real test gaps exist in 2 of 5 features (navigation, media). Fixing them is part of the implementation epic, not a blocker for this design landing.
Patches landed alongside the PRD (non-blocking for CI):
packages/media/package.json— added@vitest/coverage-v8dev dep sopnpm test -- --coverageno longer crashes.packages/media/vitest.config.ts— left intentionally minimal (no coverage block); commented to point at the L0 unification story.
The full per-layer block for media + the missing tests in navigation + media land in the L0 unification story of the implementation epic, as the first work item after the manifest schema lands.
Further notes
- Builds on: ADR-011 (TDD foundation), ADR-006 (vertical-feature-packages), the existing 5-gate conformance system.
- ADR-020 is the durable record of the architecture decisions; this PRD is the implementation seed.
- Each layer should land as a separate story (L0 unification, manifest schema + auto-derive, L1 diff, L2 aggregate, L3 mutation, ADR + docs). Estimated effort: one mid-sized epic, 6–8 stories.
- This is the canonical example of "agent-first observability": every layer optimized for machine consumption first, human consumption second.