Files
agentic-dev-template/docs/decisions/adr-014-instrumentation-sentry.md
Danijel Martinek f0775d6ecc feat(instrumentation): close R44 gap — throw-site capture for use cases + controllers
Plan 10 documented R44 (capture at originating-throw layer) but only the
R43 repo leg was wired. captureException had zero call sites in any
controller or use-case body. This commit closes the gap.

Mechanism:
- Extract __sentryReported flag helpers into core-shared/instrumentation/
  reported-flag.ts. SentryLogger switches to importing them; RecordingLogger
  carries an inlined copy (tooling → core boundary disallows the import).
- Add withCapture(logger, tags, fn) higher-order wrapper paralleling
  withSpan. On throw: capture-with-tags, mark, re-throw. Bail if the flag
  was already set — covers the bubbled-from-repo case so each error
  surfaces in the logger exactly once with the inner-most layer's tags.
- Apply withSpan(withCapture(factory)) in every feature's bind-production
  and bind-dev-seed: auth (3 use cases × 3 controllers), blog (3×3),
  marketing-pages (2×2), navigation (1×1), media (3×3). Span is outermost
  so the errored span timing reflects the capture-and-rethrow.
- RecordingLogger.captureException now also honours the flag — test
  capture counts stay honest when both repo and outer layer wrap.

Tests:
- packages/core-shared/src/instrumentation/with-capture.test.ts —
  4 cases covering success, capture-on-throw, mark-on-capture, no-double
  via the flag.
- packages/blog/tests/r44-no-double-capture.test.ts — 3 cases: repo throw
  → 1 capture with repo tags; controller parse fail → 1 capture with
  controller tags; success → 0 captures.

Verification: pnpm test 26/26, pnpm lint 15/15, pnpm typecheck 14/14.

Docs: ADR-014 and the refactor log gain a "Post-merge follow-up" section
recording the gap, the fix, and the underlying lesson (don't describe
intent as shipped state — grep first).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 00:28:22 +02:00

9.8 KiB
Raw Blame History

ADR-014 — Instrumentation & Sentry Logging

Status: Accepted Date: 2026-05-06 Spec: docs/superpowers/specs/2026-05-06-instrumentation-sentry-design.md Plan: docs/superpowers/plans/2026-05-06-plan-10-instrumentation-sentry.md

Context

The monorepo had no distributed tracing or error capture. Production failures surfaced only via stdout / stderr from a Vercel function log. We needed:

  1. End-to-end traces (browser → tRPC → use case → repo → Payload) to diagnose latency without ad-hoc timers.
  2. Centralized exception capture so silent failures (especially in CMS mutations) get a permanent record with stack + context.
  3. Privacy posture suitable for production: scrubbing for PII, masked replay, no opaque-vs-named user identifiers.

The Lazar Nikolov reference repo (nextjs-clean-architecture) demonstrates a Sentry-driven pattern with Sentry.startSpan and Sentry.captureException calls inline in use cases and repos. We needed to adapt this to:

  • Three apps (web-next, cms, web-tanstack) — not one.
  • Per-feature DI (ADR-008) — not a single container.
  • Vendor neutrality — feature packages must not import @sentry/* directly.

Decision

1. Vendor-neutral interfaces in core-shared/instrumentation/. Two interfaces (ITracer, ILogger) with three implementation pairs (Noop*, Sentry*, Recording*). Feature packages depend only on the interfaces.

2. Full-depth instrumentation. Spans nest at every layer: tRPC procedure (auto) → controller (DI-wrapped) → use case (DI-wrapped) → repository (explicit startSpan) → Payload (auto). Use case + controller spans applied via withSpan higher-order wrapper at DI binding time so factory code stays unchanged. Repositories pay explicit boilerplate per method — accepted cost for per-method visibility.

3. Throw-site capture with double-report guard. Sentry.captureException fires only at the layer that originates the error (repos catch infra; use cases catch their own throws; controllers catch parse failures; middleware does not capture). A non-enumerable __sentryReported flag prevents re-capture as errors bubble.

4. Hard PII rules (R31R38). sendDefaultPii: false (CI-grep enforced); replay default-masks all text/inputs/media (allowlist empty by default); beforeSend / beforeSendTransaction scrubbers strip emails/passwords/tokens/cookies/auth/IPs (substring-matched, including derived names like userEmail, accessToken, apiKey, ipAddress); setUser accepts only { id }.

5. Three Sentry projects, orthogonal binding. Each app gets its own DSN. bindAll()'s Rule 0 (DSN → Sentry vs Noop) is independent of USE_DEV_SEED / NODE_ENV repo binding. Optional dev-mode Sentry: developer can run pnpm dev with SENTRY_DSN set to test integration locally.

6. ESLint boundary rule (R40). no-restricted-imports blocks @sentry/* outside the allowlisted paths: core-shared/instrumentation/sentry/**, instrumentation/di/bind-sentry-instrumentation.{ts,test.ts}, core-testing/setup/no-sentry.{ts,test.ts}, and the apps' instrumentation*.{ts,mjs} / next.config.{mjs} / vite.config.{ts} entries. Allowlist patterns use **/-prefix so they match whether ESLint runs from the repo root or from inside a sub-package.

7. Test-side RecordingTracer / RecordingLogger in core-testing/instrumentation/. Tests inject them directly into factory functions (consistent with R27 from Plan 9 — direct-injection, not container manipulation). The core-testing/setup/no-sentry.ts setup file mocks @sentry/nextjs, @sentry/node, and @sentry/react at the module level, so any code that imports them gets a no-op surface during vitest runs (R49).

Alternatives considered

  • Direct @sentry/nextjs imports in features (Lazar's pattern). Rejected — couples every feature package to a vendor SDK, violating the architecture's vendor-isolation principle.
  • Procedure-only spans (no per-use-case or per-repo spans). Rejected — would lose the breakdown that makes a slow request diagnosable. The middle path (procedure + use case + controller, no per-repo) was rejected for the same reason at a finer granularity.
  • Capture in defineErrorMiddleware only. Rejected — would noisily report every input-parse / unauthenticated error as a Sentry event, polluting the inbox.
  • Single Sentry project for all apps with environment tags. Rejected — different alert routing, different quotas. Three projects scale better.
  • Always-Sentry in all environments. Rejected — pnpm test and pnpm dev should not initialize a real SDK by default. Optional dev (DSN-driven) is the cleanest rule.
  • Replay flags as configurable env vars. Rejected — the privacy posture must be the default; opt-out requires per-selector justification (R34, R51).

Consequences

Positive:

  • End-to-end traces in Sentry with full context.
  • One captured event per error (no double-report, no noise from expected domain errors).
  • Privacy-by-default replay and scrubbing.
  • Vendor-swappable: replacing Sentry means writing one new adapter pair in core-shared/instrumentation/<vendor>/.
  • Tests run against Recording* for assertions; Noop* by default.

Negative:

  • Every public repo method gains ~6 lines of tracer.startSpan(...) boilerplate. Mitigated by uniform pattern; if it ever proves excessive, a withRepoSpan collapse helper can be added.
  • __sentryReported flag mutates errors. Non-enumerable, so JSON / spread are unaffected; flag is checked only inside SentryLogger.
  • Three Sentry projects to administer.
  • Replay bundle peak (~250KB on errored sessions) — accepted; healthy sessions don't load replay payload.

Notes from execution

  • PII key-substring extension (deviation from spec): the original spec listed only explicit PII keys (email, password, token, etc.) for substring matching. During Task 25 we added ipaddress to PII_KEY_SUBSTRINGS so keys like ipAddress trigger key-level redaction. This is a tighter privacy posture than the spec's substring list — the spec's intent (no PII in events) is honoured strictly; existing scrub.test.ts cases continue to pass.
  • Web-tanstack vite.config.ts deferred: the app currently has no vite.config.ts (build is a placeholder per its package.json). The @sentry/vite-plugin dep is added but unused until the TanStack Start build is wired in a later plan. A minimal src/vite-env.d.ts shims ImportMetaEnv for the client entry until the full Vite types land.
  • Subpath exports: core-shared/package.json gained five new subpath entries (./instrumentation/sentry/{init-server,init-client,init-server-node,init-client-react,scrub}) so the apps' instrumentation*.ts files can import the helpers via deep paths without pulling the entire barrel.
  • @sentry/node + @sentry/react added as optional peerDependencies of core-shared (so feature packages don't transitively pull them) and as devDependencies (so typecheck/test runs in core-shared resolve them).
  • Pre-existing lint nits surfaced when Task 28's restricted-imports rule lit up pnpm lint: added argsIgnorePattern: "^_" to the shared eslint config (matches the underscore convention used throughout the repo); added globals.node for *.{mjs,cjs,js} and *.config.{ts,tsx} so next.config.mjs's process.env lints clean; cleared two unused-import / unused-disable nits in marketing-pages and core-testing that were unrelated to instrumentation but blocked the lint gate.
  • Direct @repo/core-shared deps: apps/cms and apps/web-tanstack previously had only transitive access to @repo/core-shared; both gained explicit workspace:* deps so the deep ./instrumentation/sentry/* subpath imports resolve.

Post-merge follow-up — closing the R44 gap

Plan 10 as merged shipped repository-side capture (R43) but not use-case or controller capture (R44). The ADR/AGENTS docs described the intended capture-rules table as if it were the as-shipped state; in fact every captureException call site lived in infrastructure/repositories/*.repository.ts. A grep proved it: zero call sites in any controller or use-case body. The user spotted the gap.

Fix (post-merge commit):

  1. Extracted the __sentryReported flag helpers into core-shared/instrumentation/reported-flag.ts (markReported, isReported). SentryLogger now imports them; RecordingLogger carries an inlined copy (tooling → core import is disallowed by the boundary rule, so duplication is the right tradeoff).
  2. Added withCapture(logger, tags, fn) higher-order wrapper at core-shared/instrumentation/with-capture.ts, parallel to withSpan. On error: capture-with-tags, mark, re-throw — but bail if the flag is already set (covers the bubbled-from-repo case).
  3. Applied withSpan(withCapture(factory)) to every use case and controller in every feature's bind-production.ts and bind-dev-seed.ts. Span is outermost so the errored span's timing reflects the capture-and-rethrow.
  4. RecordingLogger.captureException now also honours the flag, so test assertions about capture counts stay honest.
  5. Added packages/blog/tests/r44-no-double-capture.test.ts to lock the contract: an error originated in the repo is captured once with repo tags; an error originated in the controller (parse failure) is captured once with controller tags; success paths capture nothing.

Why use cases also wrap, even though current bodies mostly delegate to the repo: R44's intent is that any throw originated locally — output-schema validation, business-rule errors like AuthenticationError in signInUseCase — gets captured with use-case tags. The wrapper makes the rule uniform; the flag makes it safe.

  • ADR-008 — per-feature DI containers
  • ADR-011 — TDD foundation
  • ADR-012 — Lazar pattern conformance
  • ADR-013 — input/output unification
  • Plan 10 spec (R31R55)