Plan 10 documented R44 (capture at originating-throw layer) but only the R43 repo leg was wired. captureException had zero call sites in any controller or use-case body. This commit closes the gap. Mechanism: - Extract __sentryReported flag helpers into core-shared/instrumentation/ reported-flag.ts. SentryLogger switches to importing them; RecordingLogger carries an inlined copy (tooling → core boundary disallows the import). - Add withCapture(logger, tags, fn) higher-order wrapper paralleling withSpan. On throw: capture-with-tags, mark, re-throw. Bail if the flag was already set — covers the bubbled-from-repo case so each error surfaces in the logger exactly once with the inner-most layer's tags. - Apply withSpan(withCapture(factory)) in every feature's bind-production and bind-dev-seed: auth (3 use cases × 3 controllers), blog (3×3), marketing-pages (2×2), navigation (1×1), media (3×3). Span is outermost so the errored span timing reflects the capture-and-rethrow. - RecordingLogger.captureException now also honours the flag — test capture counts stay honest when both repo and outer layer wrap. Tests: - packages/core-shared/src/instrumentation/with-capture.test.ts — 4 cases covering success, capture-on-throw, mark-on-capture, no-double via the flag. - packages/blog/tests/r44-no-double-capture.test.ts — 3 cases: repo throw → 1 capture with repo tags; controller parse fail → 1 capture with controller tags; success → 0 captures. Verification: pnpm test 26/26, pnpm lint 15/15, pnpm typecheck 14/14. Docs: ADR-014 and the refactor log gain a "Post-merge follow-up" section recording the gap, the fix, and the underlying lesson (don't describe intent as shipped state — grep first). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
92 lines
9.8 KiB
Markdown
92 lines
9.8 KiB
Markdown
# ADR-014 — Instrumentation & Sentry Logging
|
||
|
||
**Status:** Accepted
|
||
**Date:** 2026-05-06
|
||
**Spec:** docs/superpowers/specs/2026-05-06-instrumentation-sentry-design.md
|
||
**Plan:** docs/superpowers/plans/2026-05-06-plan-10-instrumentation-sentry.md
|
||
|
||
## Context
|
||
|
||
The monorepo had no distributed tracing or error capture. Production failures surfaced only via stdout / stderr from a Vercel function log. We needed:
|
||
|
||
1. End-to-end traces (browser → tRPC → use case → repo → Payload) to diagnose latency without ad-hoc timers.
|
||
2. Centralized exception capture so silent failures (especially in CMS mutations) get a permanent record with stack + context.
|
||
3. Privacy posture suitable for production: scrubbing for PII, masked replay, no opaque-vs-named user identifiers.
|
||
|
||
The Lazar Nikolov reference repo (`nextjs-clean-architecture`) demonstrates a Sentry-driven pattern with `Sentry.startSpan` and `Sentry.captureException` calls inline in use cases and repos. We needed to adapt this to:
|
||
|
||
- Three apps (web-next, cms, web-tanstack) — not one.
|
||
- Per-feature DI (ADR-008) — not a single container.
|
||
- Vendor neutrality — feature packages must not import `@sentry/*` directly.
|
||
|
||
## Decision
|
||
|
||
**1. Vendor-neutral interfaces in `core-shared/instrumentation/`.** Two interfaces (`ITracer`, `ILogger`) with three implementation pairs (`Noop*`, `Sentry*`, `Recording*`). Feature packages depend only on the interfaces.
|
||
|
||
**2. Full-depth instrumentation.** Spans nest at every layer: tRPC procedure (auto) → controller (DI-wrapped) → use case (DI-wrapped) → repository (explicit `startSpan`) → Payload (auto). Use case + controller spans applied via `withSpan` higher-order wrapper at DI binding time so factory code stays unchanged. Repositories pay explicit boilerplate per method — accepted cost for per-method visibility.
|
||
|
||
**3. Throw-site capture with double-report guard.** `Sentry.captureException` fires only at the layer that originates the error (repos catch infra; use cases catch their own throws; controllers catch parse failures; middleware does not capture). A non-enumerable `__sentryReported` flag prevents re-capture as errors bubble.
|
||
|
||
**4. Hard PII rules (R31–R38).** `sendDefaultPii: false` (CI-grep enforced); replay default-masks all text/inputs/media (allowlist empty by default); `beforeSend` / `beforeSendTransaction` scrubbers strip emails/passwords/tokens/cookies/auth/IPs (substring-matched, including derived names like `userEmail`, `accessToken`, `apiKey`, `ipAddress`); `setUser` accepts only `{ id }`.
|
||
|
||
**5. Three Sentry projects, orthogonal binding.** Each app gets its own DSN. `bindAll()`'s Rule 0 (DSN → Sentry vs Noop) is independent of `USE_DEV_SEED` / `NODE_ENV` repo binding. Optional dev-mode Sentry: developer can run `pnpm dev` with `SENTRY_DSN` set to test integration locally.
|
||
|
||
**6. ESLint boundary rule (R40).** `no-restricted-imports` blocks `@sentry/*` outside the allowlisted paths: `core-shared/instrumentation/sentry/**`, `instrumentation/di/bind-sentry-instrumentation.{ts,test.ts}`, `core-testing/setup/no-sentry.{ts,test.ts}`, and the apps' `instrumentation*.{ts,mjs}` / `next.config.{mjs}` / `vite.config.{ts}` entries. Allowlist patterns use `**/`-prefix so they match whether ESLint runs from the repo root or from inside a sub-package.
|
||
|
||
**7. Test-side `RecordingTracer` / `RecordingLogger`** in `core-testing/instrumentation/`. Tests inject them directly into factory functions (consistent with R27 from Plan 9 — direct-injection, not container manipulation). The `core-testing/setup/no-sentry.ts` setup file mocks `@sentry/nextjs`, `@sentry/node`, and `@sentry/react` at the module level, so any code that imports them gets a no-op surface during vitest runs (R49).
|
||
|
||
## Alternatives considered
|
||
|
||
- **Direct `@sentry/nextjs` imports in features (Lazar's pattern).** Rejected — couples every feature package to a vendor SDK, violating the architecture's vendor-isolation principle.
|
||
- **Procedure-only spans (no per-use-case or per-repo spans).** Rejected — would lose the breakdown that makes a slow request diagnosable. The middle path (procedure + use case + controller, no per-repo) was rejected for the same reason at a finer granularity.
|
||
- **Capture in `defineErrorMiddleware` only.** Rejected — would noisily report every input-parse / unauthenticated error as a Sentry event, polluting the inbox.
|
||
- **Single Sentry project for all apps with environment tags.** Rejected — different alert routing, different quotas. Three projects scale better.
|
||
- **Always-Sentry in all environments.** Rejected — `pnpm test` and `pnpm dev` should not initialize a real SDK by default. Optional dev (DSN-driven) is the cleanest rule.
|
||
- **Replay flags as configurable env vars.** Rejected — the privacy posture must be the default; opt-out requires per-selector justification (R34, R51).
|
||
|
||
## Consequences
|
||
|
||
**Positive:**
|
||
- End-to-end traces in Sentry with full context.
|
||
- One captured event per error (no double-report, no noise from expected domain errors).
|
||
- Privacy-by-default replay and scrubbing.
|
||
- Vendor-swappable: replacing Sentry means writing one new adapter pair in `core-shared/instrumentation/<vendor>/`.
|
||
- Tests run against `Recording*` for assertions; `Noop*` by default.
|
||
|
||
**Negative:**
|
||
- Every public repo method gains ~6 lines of `tracer.startSpan(...)` boilerplate. Mitigated by uniform pattern; if it ever proves excessive, a `withRepoSpan` collapse helper can be added.
|
||
- `__sentryReported` flag mutates errors. Non-enumerable, so JSON / spread are unaffected; flag is checked only inside `SentryLogger`.
|
||
- Three Sentry projects to administer.
|
||
- Replay bundle peak (~250KB on errored sessions) — accepted; healthy sessions don't load replay payload.
|
||
|
||
## Notes from execution
|
||
|
||
- **PII key-substring extension (deviation from spec):** the original spec listed only explicit PII keys (`email`, `password`, `token`, etc.) for substring matching. During Task 25 we added `ipaddress` to `PII_KEY_SUBSTRINGS` so keys like `ipAddress` trigger key-level redaction. This is a tighter privacy posture than the spec's substring list — the spec's intent (no PII in events) is honoured strictly; existing `scrub.test.ts` cases continue to pass.
|
||
- **Web-tanstack vite.config.ts deferred:** the app currently has no `vite.config.ts` (build is a placeholder per its `package.json`). The `@sentry/vite-plugin` dep is added but unused until the TanStack Start build is wired in a later plan. A minimal `src/vite-env.d.ts` shims `ImportMetaEnv` for the client entry until the full Vite types land.
|
||
- **Subpath exports:** `core-shared/package.json` gained five new subpath entries (`./instrumentation/sentry/{init-server,init-client,init-server-node,init-client-react,scrub}`) so the apps' `instrumentation*.ts` files can import the helpers via deep paths without pulling the entire barrel.
|
||
- **`@sentry/node` + `@sentry/react`** added as optional `peerDependencies` of `core-shared` (so feature packages don't transitively pull them) and as `devDependencies` (so typecheck/test runs in `core-shared` resolve them).
|
||
- **Pre-existing lint nits surfaced when Task 28's restricted-imports rule lit up `pnpm lint`:** added `argsIgnorePattern: "^_"` to the shared eslint config (matches the underscore convention used throughout the repo); added `globals.node` for `*.{mjs,cjs,js}` and `*.config.{ts,tsx}` so `next.config.mjs`'s `process.env` lints clean; cleared two unused-import / unused-disable nits in marketing-pages and core-testing that were unrelated to instrumentation but blocked the lint gate.
|
||
- **Direct `@repo/core-shared` deps:** `apps/cms` and `apps/web-tanstack` previously had only transitive access to `@repo/core-shared`; both gained explicit `workspace:*` deps so the deep `./instrumentation/sentry/*` subpath imports resolve.
|
||
|
||
## Post-merge follow-up — closing the R44 gap
|
||
|
||
Plan 10 as merged shipped repository-side capture (R43) but **not** use-case or controller capture (R44). The ADR/AGENTS docs described the intended capture-rules table as if it were the as-shipped state; in fact every `captureException` call site lived in `infrastructure/repositories/*.repository.ts`. A grep proved it: zero call sites in any controller or use-case body. The user spotted the gap.
|
||
|
||
**Fix (post-merge commit):**
|
||
|
||
1. Extracted the `__sentryReported` flag helpers into `core-shared/instrumentation/reported-flag.ts` (`markReported`, `isReported`). `SentryLogger` now imports them; `RecordingLogger` carries an inlined copy (tooling → core import is disallowed by the boundary rule, so duplication is the right tradeoff).
|
||
2. Added `withCapture(logger, tags, fn)` higher-order wrapper at `core-shared/instrumentation/with-capture.ts`, parallel to `withSpan`. On error: capture-with-tags, mark, re-throw — but bail if the flag is already set (covers the bubbled-from-repo case).
|
||
3. Applied `withSpan(withCapture(factory))` to every use case and controller in every feature's `bind-production.ts` and `bind-dev-seed.ts`. Span is outermost so the errored span's timing reflects the capture-and-rethrow.
|
||
4. `RecordingLogger.captureException` now also honours the flag, so test assertions about capture counts stay honest.
|
||
5. Added `packages/blog/tests/r44-no-double-capture.test.ts` to lock the contract: an error originated in the repo is captured once with repo tags; an error originated in the controller (parse failure) is captured once with controller tags; success paths capture nothing.
|
||
|
||
**Why use cases also wrap, even though current bodies mostly delegate to the repo:** R44's intent is that *any* throw originated locally — output-schema validation, business-rule errors like `AuthenticationError` in `signInUseCase` — gets captured with use-case tags. The wrapper makes the rule uniform; the flag makes it safe.
|
||
|
||
## Related
|
||
|
||
- ADR-008 — per-feature DI containers
|
||
- ADR-011 — TDD foundation
|
||
- ADR-012 — Lazar pattern conformance
|
||
- ADR-013 — input/output unification
|
||
- Plan 10 spec (R31–R55)
|