agentic-dev-template/docs/decisions/adr-014-instrumentation-sentry.md

# ADR-014 — Instrumentation & Sentry Logging

**Status:** Accepted
**Date:** 2026-05-06
**Spec:** docs/superpowers/specs/2026-05-06-instrumentation-sentry-design.md
**Plan:** docs/superpowers/plans/2026-05-06-plan-10-instrumentation-sentry.md

## Context

The monorepo had no distributed tracing or error capture. Production failures surfaced only via stdout / stderr from a Vercel function log. We needed:

1. End-to-end traces (browser → tRPC → use case → repo → Payload) to diagnose latency without ad-hoc timers.
2. Centralized exception capture so silent failures (especially in CMS mutations) get a permanent record with stack + context.
3. Privacy posture suitable for production: scrubbing for PII, masked replay, no opaque-vs-named user identifiers.

The Lazar Nikolov reference repo (`nextjs-clean-architecture`) demonstrates a Sentry-driven pattern with `Sentry.startSpan` and `Sentry.captureException` calls inline in use cases and repos. We needed to adapt this to:

- Three apps (web-next, cms, web-tanstack) — not one.
- Per-feature DI (ADR-008) — not a single container.
- Vendor neutrality — feature packages must not import `@sentry/*` directly.

## Decision

**1. Vendor-neutral interfaces in `core-shared/instrumentation/`.** Two interfaces (`ITracer`, `ILogger`) with three implementation pairs (`Noop*`, `Sentry*`, `Recording*`). Feature packages depend only on the interfaces.

**2. Full-depth instrumentation.** Spans nest at every layer: tRPC procedure (auto) → controller (DI-wrapped) → use case (DI-wrapped) → repository (explicit `startSpan`) → Payload (auto). Use case + controller spans applied via `withSpan` higher-order wrapper at DI binding time so factory code stays unchanged. Repositories pay explicit boilerplate per method — accepted cost for per-method visibility.

**3. Throw-site capture with double-report guard.** `Sentry.captureException` fires only at the layer that originates the error (repos catch infra; use cases catch their own throws; controllers catch parse failures; middleware does not capture). A non-enumerable `__sentryReported` flag prevents re-capture as errors bubble.

**4. Hard PII rules (R31–R38).** `sendDefaultPii: false` (CI-grep enforced); replay default-masks all text/inputs/media (allowlist empty by default); `beforeSend` / `beforeSendTransaction` scrubbers strip emails/passwords/tokens/cookies/auth/IPs (substring-matched, including derived names like `userEmail`, `accessToken`, `apiKey`, `ipAddress`); `setUser` accepts only `{ id }`.

**5. Three Sentry projects, orthogonal binding.** Each app gets its own DSN. `bindAll()`'s Rule 0 (DSN → Sentry vs Noop) is independent of `USE_DEV_SEED` / `NODE_ENV` repo binding. Optional dev-mode Sentry: developer can run `pnpm dev` with `SENTRY_DSN` set to test integration locally.

**6. ESLint boundary rule (R40).** `no-restricted-imports` blocks `@sentry/*` outside the allowlisted paths: `core-shared/instrumentation/sentry/**`, `instrumentation/di/bind-sentry-instrumentation.{ts,test.ts}`, `core-testing/setup/no-sentry.{ts,test.ts}`, and the apps' `instrumentation*.{ts,mjs}` / `next.config.{mjs}` / `vite.config.{ts}` entries. Allowlist patterns use `**/`-prefix so they match whether ESLint runs from the repo root or from inside a sub-package.

**7. Test-side `RecordingTracer` / `RecordingLogger`** in `core-testing/instrumentation/`. Tests inject them directly into factory functions (consistent with R27 from Plan 9 — direct-injection, not container manipulation). The `core-testing/setup/no-sentry.ts` setup file mocks `@sentry/nextjs`, `@sentry/node`, and `@sentry/react` at the module level, so any code that imports them gets a no-op surface during vitest runs (R49).

## Alternatives considered

- **Direct `@sentry/nextjs` imports in features (Lazar's pattern).** Rejected — couples every feature package to a vendor SDK, violating the architecture's vendor-isolation principle.
- **Procedure-only spans (no per-use-case or per-repo spans).** Rejected — would lose the breakdown that makes a slow request diagnosable. The middle path (procedure + use case + controller, no per-repo) was rejected for the same reason at a finer granularity.
- **Capture in `defineErrorMiddleware` only.** Rejected — would noisily report every input-parse / unauthenticated error as a Sentry event, polluting the inbox.
- **Single Sentry project for all apps with environment tags.** Rejected — different alert routing, different quotas. Three projects scale better.
- **Always-Sentry in all environments.** Rejected — `pnpm test` and `pnpm dev` should not initialize a real SDK by default. Optional dev (DSN-driven) is the cleanest rule.
- **Replay flags as configurable env vars.** Rejected — the privacy posture must be the default; opt-out requires per-selector justification (R34, R51).

## Consequences

**Positive:**
- End-to-end traces in Sentry with full context.
- One captured event per error (no double-report, no noise from expected domain errors).
- Privacy-by-default replay and scrubbing.
- Vendor-swappable: replacing Sentry means writing one new adapter pair in `core-shared/instrumentation/<vendor>/`.
- Tests run against `Recording*` for assertions; `Noop*` by default.

**Negative:**
- Every public repo method gains ~6 lines of `tracer.startSpan(...)` boilerplate. Mitigated by uniform pattern; if it ever proves excessive, a `withRepoSpan` collapse helper can be added.
- `__sentryReported` flag mutates errors. Non-enumerable, so JSON / spread are unaffected; flag is checked only inside `SentryLogger`.
- Three Sentry projects to administer.
- Replay bundle peak (~250KB on errored sessions) — accepted; healthy sessions don't load replay payload.

## Notes from execution

- **PII key-substring extension (deviation from spec):** the original spec listed only explicit PII keys (`email`, `password`, `token`, etc.) for substring matching. During Task 25 we added `ipaddress` to `PII_KEY_SUBSTRINGS` so keys like `ipAddress` trigger key-level redaction. This is a tighter privacy posture than the spec's substring list — the spec's intent (no PII in events) is honoured strictly; existing `scrub.test.ts` cases continue to pass.
- **Web-tanstack vite.config.ts deferred:** the app currently has no `vite.config.ts` (build is a placeholder per its `package.json`). The `@sentry/vite-plugin` dep is added but unused until the TanStack Start build is wired in a later plan. A minimal `src/vite-env.d.ts` shims `ImportMetaEnv` for the client entry until the full Vite types land.
- **Subpath exports:** `core-shared/package.json` gained five new subpath entries (`./instrumentation/sentry/{init-server,init-client,init-server-node,init-client-react,scrub}`) so the apps' `instrumentation*.ts` files can import the helpers via deep paths without pulling the entire barrel.
- **`@sentry/node` + `@sentry/react`** added as optional `peerDependencies` of `core-shared` (so feature packages don't transitively pull them) and as `devDependencies` (so typecheck/test runs in `core-shared` resolve them).
- **Pre-existing lint nits surfaced when Task 28's restricted-imports rule lit up `pnpm lint`:** added `argsIgnorePattern: "^_"` to the shared eslint config (matches the underscore convention used throughout the repo); added `globals.node` for `*.{mjs,cjs,js}` and `*.config.{ts,tsx}` so `next.config.mjs`'s `process.env` lints clean; cleared two unused-import / unused-disable nits in marketing-pages and core-testing that were unrelated to instrumentation but blocked the lint gate.
- **Direct `@repo/core-shared` deps:** `apps/cms` and `apps/web-tanstack` previously had only transitive access to `@repo/core-shared`; both gained explicit `workspace:*` deps so the deep `./instrumentation/sentry/*` subpath imports resolve.

## Post-merge follow-up — closing the R44 gap

Plan 10 as merged shipped repository-side capture (R43) but **not** use-case or controller capture (R44). The ADR/AGENTS docs described the intended capture-rules table as if it were the as-shipped state; in fact every `captureException` call site lived in `infrastructure/repositories/*.repository.ts`. A grep proved it: zero call sites in any controller or use-case body. The user spotted the gap.

**Fix (post-merge commit):**

1. Extracted the `__sentryReported` flag helpers into `core-shared/instrumentation/reported-flag.ts` (`markReported`, `isReported`). `SentryLogger` now imports them; `RecordingLogger` carries an inlined copy (tooling → core import is disallowed by the boundary rule, so duplication is the right tradeoff).
2. Added `withCapture(logger, tags, fn)` higher-order wrapper at `core-shared/instrumentation/with-capture.ts`, parallel to `withSpan`. On error: capture-with-tags, mark, re-throw — but bail if the flag is already set (covers the bubbled-from-repo case).
3. Applied `withSpan(withCapture(factory))` to every use case and controller in every feature's `bind-production.ts` and `bind-dev-seed.ts`. Span is outermost so the errored span's timing reflects the capture-and-rethrow.
4. `RecordingLogger.captureException` now also honours the flag, so test assertions about capture counts stay honest.
5. Added `packages/blog/tests/r44-no-double-capture.test.ts` to lock the contract: an error originated in the repo is captured once with repo tags; an error originated in the controller (parse failure) is captured once with controller tags; success paths capture nothing.

**Why use cases also wrap, even though current bodies mostly delegate to the repo:** R44's intent is that *any* throw originated locally — output-schema validation, business-rule errors like `AuthenticationError` in `signInUseCase` — gets captured with use-case tags. The wrapper makes the rule uniform; the flag makes it safe.

## Related

- ADR-008 — per-feature DI containers
- ADR-011 — TDD foundation
- ADR-012 — Lazar pattern conformance
- ADR-013 — input/output unification
- Plan 10 spec (R31–R55)