Files
agentic-dev/docs/decisions/adr-014-instrumentation-sentry.md
Danijel Martinek 841655573b docs(adr): rename ADR-012 — drop Lazar; update title + content + cross-refs
- Rename docs/decisions/adr-012-lazar-conformance.md → adr-012-feature-conventions.md
- Strip "Lazar", "Plan 8/9/10/11", "refactor-logs" refs from all ADRs,
  architecture docs, HTML explainers, and feature/core AGENTS.md files
- Update all incoming links in docs/, packages/*/AGENTS.md, HTML explainers

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 10:07:37 +02:00

92 lines
9.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-014 — Instrumentation & Sentry Logging
**Status:** Accepted
**Status (revised):** Superseded by ADR-017 for the implementation layer. The interface decisions remain authoritative.
**Date:** 2026-05-06
## Context
The monorepo had no distributed tracing or error capture. Production failures surfaced only via stdout / stderr from a Vercel function log. We needed:
1. End-to-end traces (browser → tRPC → use case → repo → Payload) to diagnose latency without ad-hoc timers.
2. Centralized exception capture so silent failures (especially in CMS mutations) get a permanent record with stack + context.
3. Privacy posture suitable for production: scrubbing for PII, masked replay, no opaque-vs-named user identifiers.
The reference Clean Architecture repo demonstrates a Sentry-driven pattern with `Sentry.startSpan` and `Sentry.captureException` calls inline in use cases and repos. We needed to adapt this to:
- Three apps (web-next, cms, web-tanstack) — not one.
- Per-feature DI (ADR-008) — not a single container.
- Vendor neutrality — feature packages must not import `@sentry/*` directly.
## Decision
**1. Vendor-neutral interfaces in `core-shared/instrumentation/`.** Two interfaces (`ITracer`, `ILogger`) with three implementation pairs (`Noop*`, `Sentry*`, `Recording*`). Feature packages depend only on the interfaces.
**2. Full-depth instrumentation.** Spans nest at every layer: tRPC procedure (auto) → controller (DI-wrapped) → use case (DI-wrapped) → repository (explicit `startSpan`) → Payload (auto). Use case + controller spans applied via `withSpan` higher-order wrapper at DI binding time so factory code stays unchanged. Repositories pay explicit boilerplate per method — accepted cost for per-method visibility.
**3. Throw-site capture with double-report guard.** `Sentry.captureException` fires only at the layer that originates the error (repos catch infra; use cases catch their own throws; controllers catch parse failures; middleware does not capture). A non-enumerable `__sentryReported` flag prevents re-capture as errors bubble.
**4. Hard PII rules (R31R38).** `sendDefaultPii: false` (CI-grep enforced); replay default-masks all text/inputs/media (allowlist empty by default); `beforeSend` / `beforeSendTransaction` scrubbers strip emails/passwords/tokens/cookies/auth/IPs (substring-matched, including derived names like `userEmail`, `accessToken`, `apiKey`, `ipAddress`); `setUser` accepts only `{ id }`.
**5. Three Sentry projects, orthogonal binding.** Each app gets its own DSN. `bindAll()`'s Rule 0 (DSN → Sentry vs Noop) is independent of `USE_DEV_SEED` / `NODE_ENV` repo binding. Optional dev-mode Sentry: developer can run `pnpm dev` with `SENTRY_DSN` set to test integration locally.
**6. ESLint boundary rule (R40).** `no-restricted-imports` blocks `@sentry/*` outside the allowlisted paths: `core-shared/instrumentation/sentry/**`, `instrumentation/di/bind-sentry-instrumentation.{ts,test.ts}`, `core-testing/setup/no-sentry.{ts,test.ts}`, and the apps' `instrumentation*.{ts,mjs}` / `next.config.{mjs}` / `vite.config.{ts}` entries. Allowlist patterns use `**/`-prefix so they match whether ESLint runs from the repo root or from inside a sub-package.
**7. Test-side `RecordingTracer` / `RecordingLogger`** in `core-testing/instrumentation/`. Tests inject them directly into factory functions (direct-injection, not container manipulation). The `core-testing/setup/no-sentry.ts` setup file mocks `@sentry/nextjs`, `@sentry/node`, and `@sentry/react` at the module level, so any code that imports them gets a no-op surface during vitest runs.
## Alternatives considered
- **Direct `@sentry/nextjs` imports in features.** Rejected — couples every feature package to a vendor SDK, violating the architecture's vendor-isolation principle.
- **Procedure-only spans (no per-use-case or per-repo spans).** Rejected — would lose the breakdown that makes a slow request diagnosable. The middle path (procedure + use case + controller, no per-repo) was rejected for the same reason at a finer granularity.
- **Capture in `defineErrorMiddleware` only.** Rejected — would noisily report every input-parse / unauthenticated error as a Sentry event, polluting the inbox.
- **Single Sentry project for all apps with environment tags.** Rejected — different alert routing, different quotas. Three projects scale better.
- **Always-Sentry in all environments.** Rejected — `pnpm test` and `pnpm dev` should not initialize a real SDK by default. Optional dev (DSN-driven) is the cleanest rule.
- **Replay flags as configurable env vars.** Rejected — the privacy posture must be the default; opt-out requires per-selector justification (R34, R51).
## Consequences
**Positive:**
- End-to-end traces in Sentry with full context.
- One captured event per error (no double-report, no noise from expected domain errors).
- Privacy-by-default replay and scrubbing.
- Vendor-swappable: replacing Sentry means writing one new adapter pair in `core-shared/instrumentation/<vendor>/`.
- Tests run against `Recording*` for assertions; `Noop*` by default.
**Negative:**
- Every public repo method gains ~6 lines of `tracer.startSpan(...)` boilerplate. Mitigated by uniform pattern; if it ever proves excessive, a `withRepoSpan` collapse helper can be added.
- `__sentryReported` flag mutates errors. Non-enumerable, so JSON / spread are unaffected; flag is checked only inside `SentryLogger`.
- Three Sentry projects to administer.
- Replay bundle peak (~250KB on errored sessions) — accepted; healthy sessions don't load replay payload.
## Notes from execution
- **PII key-substring extension (deviation from spec):** the original spec listed only explicit PII keys (`email`, `password`, `token`, etc.) for substring matching. During Task 25 we added `ipaddress` to `PII_KEY_SUBSTRINGS` so keys like `ipAddress` trigger key-level redaction. This is a tighter privacy posture than the spec's substring list — the spec's intent (no PII in events) is honoured strictly; existing `scrub.test.ts` cases continue to pass.
- **Web-tanstack vite.config.ts deferred:** the app currently has no `vite.config.ts` (build is a placeholder per its `package.json`). The `@sentry/vite-plugin` dep is added but unused until the TanStack Start build is wired in a later plan. A minimal `src/vite-env.d.ts` shims `ImportMetaEnv` for the client entry until the full Vite types land.
- **Subpath exports:** `core-shared/package.json` gained five new subpath entries (`./instrumentation/sentry/{init-server,init-client,init-server-node,init-client-react,scrub}`) so the apps' `instrumentation*.ts` files can import the helpers via deep paths without pulling the entire barrel.
- **`@sentry/node` + `@sentry/react`** added as optional `peerDependencies` of `core-shared` (so feature packages don't transitively pull them) and as `devDependencies` (so typecheck/test runs in `core-shared` resolve them).
- **Pre-existing lint nits surfaced when Task 28's restricted-imports rule lit up `pnpm lint`:** added `argsIgnorePattern: "^_"` to the shared eslint config (matches the underscore convention used throughout the repo); added `globals.node` for `*.{mjs,cjs,js}` and `*.config.{ts,tsx}` so `next.config.mjs`'s `process.env` lints clean; cleared two unused-import / unused-disable nits in marketing-pages and core-testing that were unrelated to instrumentation but blocked the lint gate.
- **Direct `@repo/core-shared` deps:** `apps/cms` and `apps/web-tanstack` previously had only transitive access to `@repo/core-shared`; both gained explicit `workspace:*` deps so the deep `./instrumentation/sentry/*` subpath imports resolve.
## Post-merge follow-up — closing the R44 gap
The initial implementation shipped repository-side capture but **not** use-case or controller capture. The ADR/AGENTS docs described the intended capture-rules table as if it were the as-shipped state; in fact every `captureException` call site lived in `infrastructure/repositories/*.repository.ts`. A grep proved it: zero call sites in any controller or use-case body. The gap was spotted and fixed.
**Fix (post-merge commit):**
1. Extracted the `__sentryReported` flag helpers into `core-shared/instrumentation/reported-flag.ts` (`markReported`, `isReported`). `SentryLogger` now imports them; `RecordingLogger` carries an inlined copy (tooling → core import is disallowed by the boundary rule, so duplication is the right tradeoff).
2. Added `withCapture(logger, tags, fn)` higher-order wrapper at `core-shared/instrumentation/with-capture.ts`, parallel to `withSpan`. On error: capture-with-tags, mark, re-throw — but bail if the flag is already set (covers the bubbled-from-repo case).
3. Applied `withSpan(withCapture(factory))` to every use case and controller in every feature's `bind-production.ts` and `bind-dev-seed.ts`. Span is outermost so the errored span's timing reflects the capture-and-rethrow.
4. `RecordingLogger.captureException` now also honours the flag, so test assertions about capture counts stay honest.
5. Added `packages/blog/tests/r44-no-double-capture.test.ts` to lock the contract: an error originated in the repo is captured once with repo tags; an error originated in the controller (parse failure) is captured once with controller tags; success paths capture nothing.
**Why use cases also wrap, even though current bodies mostly delegate to the repo:** R44's intent is that _any_ throw originated locally — output-schema validation, business-rule errors like `AuthenticationError` in `signInUseCase` — gets captured with use-case tags. The wrapper makes the rule uniform; the flag makes it safe.
## Related
- ADR-008 — per-feature DI containers
- ADR-011 — TDD foundation
- ADR-012 — feature conventions
- ADR-013 — input/output unification