- Rename docs/decisions/adr-012-lazar-conformance.md → adr-012-feature-conventions.md - Strip "Lazar", "Plan 8/9/10/11", "refactor-logs" refs from all ADRs, architecture docs, HTML explainers, and feature/core AGENTS.md files - Update all incoming links in docs/, packages/*/AGENTS.md, HTML explainers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.6 KiB
ADR-014 — Instrumentation & Sentry Logging
Status: Accepted Status (revised): Superseded by ADR-017 for the implementation layer. The interface decisions remain authoritative. Date: 2026-05-06
Context
The monorepo had no distributed tracing or error capture. Production failures surfaced only via stdout / stderr from a Vercel function log. We needed:
- End-to-end traces (browser → tRPC → use case → repo → Payload) to diagnose latency without ad-hoc timers.
- Centralized exception capture so silent failures (especially in CMS mutations) get a permanent record with stack + context.
- Privacy posture suitable for production: scrubbing for PII, masked replay, no opaque-vs-named user identifiers.
The reference Clean Architecture repo demonstrates a Sentry-driven pattern with Sentry.startSpan and Sentry.captureException calls inline in use cases and repos. We needed to adapt this to:
- Three apps (web-next, cms, web-tanstack) — not one.
- Per-feature DI (ADR-008) — not a single container.
- Vendor neutrality — feature packages must not import
@sentry/*directly.
Decision
1. Vendor-neutral interfaces in core-shared/instrumentation/. Two interfaces (ITracer, ILogger) with three implementation pairs (Noop*, Sentry*, Recording*). Feature packages depend only on the interfaces.
2. Full-depth instrumentation. Spans nest at every layer: tRPC procedure (auto) → controller (DI-wrapped) → use case (DI-wrapped) → repository (explicit startSpan) → Payload (auto). Use case + controller spans applied via withSpan higher-order wrapper at DI binding time so factory code stays unchanged. Repositories pay explicit boilerplate per method — accepted cost for per-method visibility.
3. Throw-site capture with double-report guard. Sentry.captureException fires only at the layer that originates the error (repos catch infra; use cases catch their own throws; controllers catch parse failures; middleware does not capture). A non-enumerable __sentryReported flag prevents re-capture as errors bubble.
4. Hard PII rules (R31–R38). sendDefaultPii: false (CI-grep enforced); replay default-masks all text/inputs/media (allowlist empty by default); beforeSend / beforeSendTransaction scrubbers strip emails/passwords/tokens/cookies/auth/IPs (substring-matched, including derived names like userEmail, accessToken, apiKey, ipAddress); setUser accepts only { id }.
5. Three Sentry projects, orthogonal binding. Each app gets its own DSN. bindAll()'s Rule 0 (DSN → Sentry vs Noop) is independent of USE_DEV_SEED / NODE_ENV repo binding. Optional dev-mode Sentry: developer can run pnpm dev with SENTRY_DSN set to test integration locally.
6. ESLint boundary rule (R40). no-restricted-imports blocks @sentry/* outside the allowlisted paths: core-shared/instrumentation/sentry/**, instrumentation/di/bind-sentry-instrumentation.{ts,test.ts}, core-testing/setup/no-sentry.{ts,test.ts}, and the apps' instrumentation*.{ts,mjs} / next.config.{mjs} / vite.config.{ts} entries. Allowlist patterns use **/-prefix so they match whether ESLint runs from the repo root or from inside a sub-package.
7. Test-side RecordingTracer / RecordingLogger in core-testing/instrumentation/. Tests inject them directly into factory functions (direct-injection, not container manipulation). The core-testing/setup/no-sentry.ts setup file mocks @sentry/nextjs, @sentry/node, and @sentry/react at the module level, so any code that imports them gets a no-op surface during vitest runs.
Alternatives considered
- Direct
@sentry/nextjsimports in features. Rejected — couples every feature package to a vendor SDK, violating the architecture's vendor-isolation principle. - Procedure-only spans (no per-use-case or per-repo spans). Rejected — would lose the breakdown that makes a slow request diagnosable. The middle path (procedure + use case + controller, no per-repo) was rejected for the same reason at a finer granularity.
- Capture in
defineErrorMiddlewareonly. Rejected — would noisily report every input-parse / unauthenticated error as a Sentry event, polluting the inbox. - Single Sentry project for all apps with environment tags. Rejected — different alert routing, different quotas. Three projects scale better.
- Always-Sentry in all environments. Rejected —
pnpm testandpnpm devshould not initialize a real SDK by default. Optional dev (DSN-driven) is the cleanest rule. - Replay flags as configurable env vars. Rejected — the privacy posture must be the default; opt-out requires per-selector justification (R34, R51).
Consequences
Positive:
- End-to-end traces in Sentry with full context.
- One captured event per error (no double-report, no noise from expected domain errors).
- Privacy-by-default replay and scrubbing.
- Vendor-swappable: replacing Sentry means writing one new adapter pair in
core-shared/instrumentation/<vendor>/. - Tests run against
Recording*for assertions;Noop*by default.
Negative:
- Every public repo method gains ~6 lines of
tracer.startSpan(...)boilerplate. Mitigated by uniform pattern; if it ever proves excessive, awithRepoSpancollapse helper can be added. __sentryReportedflag mutates errors. Non-enumerable, so JSON / spread are unaffected; flag is checked only insideSentryLogger.- Three Sentry projects to administer.
- Replay bundle peak (~250KB on errored sessions) — accepted; healthy sessions don't load replay payload.
Notes from execution
- PII key-substring extension (deviation from spec): the original spec listed only explicit PII keys (
email,password,token, etc.) for substring matching. During Task 25 we addedipaddresstoPII_KEY_SUBSTRINGSso keys likeipAddresstrigger key-level redaction. This is a tighter privacy posture than the spec's substring list — the spec's intent (no PII in events) is honoured strictly; existingscrub.test.tscases continue to pass. - Web-tanstack vite.config.ts deferred: the app currently has no
vite.config.ts(build is a placeholder per itspackage.json). The@sentry/vite-plugindep is added but unused until the TanStack Start build is wired in a later plan. A minimalsrc/vite-env.d.tsshimsImportMetaEnvfor the client entry until the full Vite types land. - Subpath exports:
core-shared/package.jsongained five new subpath entries (./instrumentation/sentry/{init-server,init-client,init-server-node,init-client-react,scrub}) so the apps'instrumentation*.tsfiles can import the helpers via deep paths without pulling the entire barrel. @sentry/node+@sentry/reactadded as optionalpeerDependenciesofcore-shared(so feature packages don't transitively pull them) and asdevDependencies(so typecheck/test runs incore-sharedresolve them).- Pre-existing lint nits surfaced when Task 28's restricted-imports rule lit up
pnpm lint: addedargsIgnorePattern: "^_"to the shared eslint config (matches the underscore convention used throughout the repo); addedglobals.nodefor*.{mjs,cjs,js}and*.config.{ts,tsx}sonext.config.mjs'sprocess.envlints clean; cleared two unused-import / unused-disable nits in marketing-pages and core-testing that were unrelated to instrumentation but blocked the lint gate. - Direct
@repo/core-shareddeps:apps/cmsandapps/web-tanstackpreviously had only transitive access to@repo/core-shared; both gained explicitworkspace:*deps so the deep./instrumentation/sentry/*subpath imports resolve.
Post-merge follow-up — closing the R44 gap
The initial implementation shipped repository-side capture but not use-case or controller capture. The ADR/AGENTS docs described the intended capture-rules table as if it were the as-shipped state; in fact every captureException call site lived in infrastructure/repositories/*.repository.ts. A grep proved it: zero call sites in any controller or use-case body. The gap was spotted and fixed.
Fix (post-merge commit):
- Extracted the
__sentryReportedflag helpers intocore-shared/instrumentation/reported-flag.ts(markReported,isReported).SentryLoggernow imports them;RecordingLoggercarries an inlined copy (tooling → core import is disallowed by the boundary rule, so duplication is the right tradeoff). - Added
withCapture(logger, tags, fn)higher-order wrapper atcore-shared/instrumentation/with-capture.ts, parallel towithSpan. On error: capture-with-tags, mark, re-throw — but bail if the flag is already set (covers the bubbled-from-repo case). - Applied
withSpan(withCapture(factory))to every use case and controller in every feature'sbind-production.tsandbind-dev-seed.ts. Span is outermost so the errored span's timing reflects the capture-and-rethrow. RecordingLogger.captureExceptionnow also honours the flag, so test assertions about capture counts stay honest.- Added
packages/blog/tests/r44-no-double-capture.test.tsto lock the contract: an error originated in the repo is captured once with repo tags; an error originated in the controller (parse failure) is captured once with controller tags; success paths capture nothing.
Why use cases also wrap, even though current bodies mostly delegate to the repo: R44's intent is that any throw originated locally — output-schema validation, business-rule errors like AuthenticationError in signInUseCase — gets captured with use-case tags. The wrapper makes the rule uniform; the flag makes it safe.
Related
- ADR-008 — per-feature DI containers
- ADR-011 — TDD foundation
- ADR-012 — feature conventions
- ADR-013 — input/output unification