Files
agentic-dev/docs/work/prds/compliance-manifests-pii-retention-subprocessors.prd.md
Danijel Martinek d3278c0aa1 docs: seed PRD for compliance manifests epic (ADR-025 Epic A)
Implementation seed for ADR-025 Epic A: declarative PII inventory +
retention + sub-processor manifests with three generators, pre-commit +
CI drift detection, background purge job, and ADR-022 trace
frontmatter extension for sub-processor fields (discriminated union).
Eleven user stories ordered for the decomposer; five open questions
with recommendations. Status: approved — ready for pnpm work decompose.
2026-05-18 19:55:37 +02:00

26 KiB

id, title, type, status, author, created, updated
id title type status author created updated
compliance-manifests-pii-retention-subprocessors Declarative compliance manifests (PII + retention + sub-processors) — Epic A of ADR-025 prd approved danijel 2026-05-18T17:52:09Z 2026-05-18T17:55:38.523Z

Problem

A consumer adopting this template today gets the audit channel (ADR-018), observability PII boundary (ADR-017 §7), EU library residency filter (ADR-022), and supply-chain hardening (ADR-023) — about half the playbook's surface. But three load-bearing compliance artifacts that a DPA auditor (or a partner doing GDPR due diligence) expects are missing:

  • Data map — no structured inventory of "what personal data does this system hold, where, with what retention, exportable by whom"
  • Retention policy — no declarative source of truth for how long data lives per collection, no scheduled purge mechanism
  • Sub-processor inventory — no record of "which third-party services receive personal data, under what DPA, in what region"

Every downstream EU-bound consumer currently has to invent these three artifacts from scratch, and the cost of drift between code (what's actually PII) and documentation (what we say is PII) is high enough that most teams ship with both stale.

ADR-025 settled the strategy: tag at the Payload collection/field level + generators emit compliance/*.yml + extend ADR-022 traces for sub-processors. This PRD is the implementation seed for Epic A.

Goal

Ship the declarative compliance manifests + generators so downstream consumers get a complete, automatically-validated PII inventory, retention policy, and sub-processor record by editing source-of-truth Payload configs and ADR-022 library traces. Drift detection runs in pre-commit + CI; the consumer's compliance/ directory becomes audit evidence.

In scope

  • Type primitives in core-shared/payload/: PiiCategory, DataProcessingPurpose, RetentionAction, RetentionTrigger, FieldPii, CollectionRetention, AuthPiiDefaults
  • TypeScript ambient module declaration extending Payload's custom: Record<string, unknown> to type pii (per field) and retention (per collection)
  • Three generators under scripts/compliance/:
    • emit-data-map.mjscompliance/data-map.yml
    • emit-retention-policy.mjscompliance/retention-policy.yml
    • emit-sub-processors.mjscompliance/sub-processors.yml
  • Orchestrator emit-all.mjs + pnpm compliance:* package scripts
  • --check mode on every generator for drift detection
  • Pre-commit hook integration: conditional (runs only when staged files match Payload configs or library traces) auto-regenerate + auto-stage
  • CI integration: pnpm compliance:emit-all --check step in ci.yml's validate job, hard-fail on drift
  • New conformance ESLint rule pii-declaration-must-be-complete (warn): flags custom.pii: {...} blocks missing required sub-fields
  • ADR-022 amendment: trace frontmatter gains discriminated-union sub-processor fields (is-sub-processor, processes-pii, conditional data-sent/region/dpa-signed/sccs-required/contact)
  • /evaluate-library skill update: prompts for the new fields during trace authoring
  • Background purge job in core-shared/payload/retention-purge/ using existing IJobQueue infrastructure; emits an audit entry per row purged
  • Backfill of existing template collections per Q6 of the Epic A grill:
    • auth.users: full PII tagging on displayName + custom.retention + custom.authPii overrides (if non-default)
    • All 6 existing collections: custom.retention declared
    • Other collections: PII tags only where unambiguous (e.g., media.media.uploadedBy if tracked)
  • docs/compliance/ reference files: data-map.example.yml, retention-policy.example.yml, sub-processors.example.yml, README.md explaining the docs/compliance/ (templates) vs root compliance/ (live artifacts) split

Out of scope

  • DSR scaffold (@repo/core-dsr, IDataExport/IDataDelete/IDataRectify/IProcessingRestriction) — Epic B
  • Consent abstraction (@repo/core-consent, IConsent, ConsentChecked brand, requiresConsent manifest field) — Epic B
  • Cookie consent UI component — Epic B
  • Security headers middleware — Epic C
  • Rate-limit primitive (IRateLimit, RateLimited brand) — Epic C
  • SBOM generation in CI — Epic C
  • Compliance fill-in docs (runbooks, policies, pre-launch checklist) — Epic D
  • Per-feature PII migration beyond what Q6 specifies — consumers ship their own
  • Pure-HTTP sub-processors with no library trace — allowed as hand-authored compliance/sub-processors.yml entries, generator handles "manual entry, no trace" with a CI-visible flag, but no scaffold for editing them
  • Retention enforcement for non-Payload data stores (Redis, S3, log aggregators) — out of scope; the template doesn't ship those abstractions yet
  • Cross-region transfer assessment (Schrems II / TIA artifacts) — partially addressed via region field in sub-processor records, but DPIA-style transfer-impact-assessment docs are Epic D's territory

Constraints

  • ADR-025 — Epic A's strategy is settled there. Implementation may surface details ADR-025 didn't anticipate; flag those for amendment before proceeding.
  • ADR-022 — sub-processor trace fields extend ADR-022's frontmatter. The discriminated-union shape and the /evaluate-library skill prompts are amendments captured in this PRD.
  • ADR-018 — purge job emits IAuditLog.record({ action: "DELETE", reason: "retention-policy" }) per row purged. Uses the existing core-audit audit channel; doesn't introduce new audit semantics.
  • ADR-023 — pre-commit + CI integration follows the existing pattern (.husky/pre-commit already runs bump-updated-timestamps.mjs; ci.yml already has a validate job).
  • Manifest-first ordering — the PII type primitives + Payload ambient declaration are the "manifest" for this work; they land first.
  • Generator-first — Payload collections do NOT get hand-rolled scaffolding; existing collection files are modified in place per Q6.
  • core-shared must-have boundary — purge job lives in core-shared/payload/ because every template consumer uses Payload. Doesn't create a new optional core (per Q5).
  • No --no-verify — pre-commit hook auto-regenerates compliance YAMLs; developers cannot bypass with --no-verify per repo policy. CI re-checks anyway.
  • Conventional Commits — every slice lands as one green commit per the established session convention.

Success criteria

  • pnpm compliance:emit-all produces three deterministic YAMLs at compliance/data-map.yml, compliance/retention-policy.yml, compliance/sub-processors.yml.
  • pnpm compliance:emit-all --check exits 0 when committed YAMLs match source declarations; exits non-zero with a readable diff otherwise.
  • Pre-commit hook auto-regenerates conditionally — staging only a non-Payload-config / non-trace file does NOT trigger the generators.
  • CI workflow (ci.yml's validate job) blocks merges with mismatched compliance YAMLs.
  • pii-declaration-must-be-complete ESLint rule fires on a custom.pii: { category: "contact-email" } (missing required sub-fields) in a synthetic Payload collection fixture.
  • auth.users has a complete custom.pii tag set: displayName tagged, email covered by PAYLOAD_AUTH_PII_DEFAULTS, password/salt/hash excluded by the same default.
  • All 6 existing template collections declare custom.retention.
  • A library trace authored via /evaluate-library with is-sub-processor: true triggers the conditional fields prompt; the trace fails validation if any are missing.
  • The retention purge job, when run via pnpm work dispatch or via local pnpm dev boot, schedules per-collection deletes; deleted rows produce audit entries with action: "DELETE" and reason: "retention-policy".
  • pnpm typecheck && pnpm lint && pnpm test && pnpm conformance && pnpm fallow:audit && pnpm compliance:emit-all --check all pass green at every commit boundary.

User stories

  1. As a template author, I want declarative PII + retention + sub-processor inventories so downstream consumers get audit evidence by editing source-of-truth configs.
  2. As a downstream consumer, I want to add custom.pii: { category: "contact-email", ... } to a Payload field and have it appear in compliance/data-map.yml after pnpm compliance:emit-all.
  3. As a downstream consumer, I want collection-level retention with cron-schedulable purge so old data gets deleted automatically without my writing a custom cron job.
  4. As a downstream consumer running a serverless deployment, I want purgeSchedule to be runnable by either a process-local scheduler or an external cron — the interface accepts both.
  5. As a downstream consumer, I want a CI gate that fails my PR if I add a new Payload field with pii: true but forget to regenerate compliance/data-map.yml so my audit evidence stays in sync.
  6. As a downstream consumer, I want is-sub-processor: true traces to drive compliance/sub-processors.yml automatically so I don't maintain a separate inventory.
  7. As an AI agent scaffolding a new Payload collection, I want the TypeScript types for custom.pii/custom.retention to be enforced by the compiler so I can't ship an invalid declaration.
  8. As an AI agent evaluating a new library via /evaluate-library, I want the skill to ask "is this a sub-processor?" upfront so I author a complete trace in one pass.
  9. As a compliance reviewer, I want compliance/sub-processors.yml to match DPA Section D so a regulator review surfaces zero discrepancies.
  10. As a template author, I want the existing auth.users collection backfilled with PII tags so the template has a working reference example (not just empty schema).
  11. As a template author, I want password/salt/hash/resetPasswordToken excluded from the data map by default so downstream consumers can't accidentally ship them as "exportable user data."

Implementation decisions

Module surface

  • @repo/core-shared modifications (must-have package):
    • New module core-shared/payload/pii-types.ts exporting PiiCategory, DataProcessingPurpose, RetentionAction, RetentionTrigger, FieldPii, AuthPiiDefaults, PAYLOAD_AUTH_PII_DEFAULTS
    • New module core-shared/payload/retention-types.ts exporting CollectionRetention, ISO8601Duration (helper)
    • New module core-shared/payload/retention-purge/ containing retention-purge.job.ts + unit test
    • Ambient TypeScript declaration extending Payload's Field and CollectionConfig custom?: {} to type pii?: FieldPii and retention?: CollectionRetention respectively
  • No new optional core packages — Epic A doesn't introduce core-retention or similar (per Q5)
  • @repo/core-eslint: new rule pii-declaration-must-be-complete (warn). Extends _manifest-ast.js with a Payload-collection field-parser
  • scripts/compliance/: 4 new mjs scripts (3 emitters + 1 orchestrator) following the established scripts/<topic>/ pattern
  • Existing feature packages (auth, blog, media, marketing-pages, navigation): Payload collection files modified in place to add custom.retention (all) and custom.pii (auth fully, others sparingly per Q6)
  • .claude/skills/evaluate-library/SKILL.md: updated with the two new prompt questions and the discriminated-union trace template
  • .husky/pre-commit: new conditional step for pnpm compliance:emit-all
  • .github/workflows/ci.yml: new step pnpm compliance:emit-all --check in validate job
  • package.json (root): new scripts compliance:data-map, compliance:retention-policy, compliance:sub-processors, compliance:emit-all

Type primitive contracts (decision-encoding inlined)

// core-shared/payload/pii-types.ts
export type PiiCategory =
  | "contact-email"
  | "contact-phone"
  | "contact-address"
  | "identification-name"
  | "identification-username"
  | "identification-government-id"
  | "auth-credential"
  | "auth-token"
  | "network-ip"
  | "network-user-agent"
  | "financial-info"
  | "behavioral-engagement"
  | "document-content"
  | "derived-metric"
  | (string & Record<never, never>); // declaration-merge escape hatch for consumer extension

export type DataProcessingPurpose =
  | "account-authentication"
  | "transactional-notifications"
  | "marketing-communications"
  | "analytics-aggregation"
  | "legal-compliance"
  | "service-delivery"
  | (string & Record<never, never>);

export type RetentionTrigger =
  | "from-creation"
  | "from-last-access"
  | "after-deletion";
export type RetentionAction = "hard-delete" | "pseudonymize";

export type FieldRetention = {
  duration: string; // ISO 8601 duration, e.g. "P30D"
  trigger: RetentionTrigger;
  action: RetentionAction;
};

export type FieldPii = {
  category: PiiCategory;
  purpose: DataProcessingPurpose[];
  retention?: FieldRetention; // optional; falls back to collection-level when omitted
  exportable: boolean;
  restrictable: boolean;
};

// PAYLOAD_AUTH_PII_DEFAULTS — applied automatically when `auth: true`
// `null` = excluded from data-map (security material, never PII-export)
// Consumer overrides via `custom.authPii: { email: { ...override }, totpSecret: null }`
export const PAYLOAD_AUTH_PII_DEFAULTS: Record<string, FieldPii | null> = {
  email: {
    category: "contact-email",
    purpose: ["account-authentication", "transactional-notifications"],
    exportable: true,
    restrictable: true,
  },
  password: null,
  salt: null,
  hash: null,
  resetPasswordToken: null,
  resetPasswordExpiration: null,
  loginAttempts: null,
  lockUntil: null,
  apiKey: null,
  apiKeyIndex: null,
};
// core-shared/payload/retention-types.ts
export type PurgeSchedule = "daily" | "weekly" | "monthly" | string; // cron expression

export type CollectionRetention = {
  activeRetention?: {
    duration: string;
    trigger: "from-creation" | "from-last-access";
  };
  postDeletion?: {
    duration: string;
    trigger: "after-deletion";
    action: RetentionAction;
  };
  purgeSchedule: PurgeSchedule;
  coldArchive?: { duration: string; trigger: "from-creation" };
};

Ambient module declaration:

// core-shared/payload/payload-custom-ambient.d.ts
declare module "payload" {
  interface Field {
    custom?: {
      pii?: FieldPii;
      [key: string]: unknown;
    };
  }
  interface CollectionConfig {
    custom?: {
      retention?: CollectionRetention;
      authPii?: Record<string, FieldPii | null>;
      [key: string]: unknown;
    };
  }
}

Generator contracts

Each generator runs in one of three modes:

  • Default: emit YAML to compliance/<artifact>.yml, overwriting
  • --check: regenerate in-memory, diff against existing file, exit 0 on match, non-zero with diff on mismatch
  • --print: emit to stdout (for debugging)

YAML output is deterministic (sorted keys, normalized formatting, trailing newline) so byte-identical runs produce byte-identical output.

emit-data-map.mjs:

  • Walks every Payload collection across packages (uses @repo/cms to load all configs)
  • For each field in fields[], reads custom.pii if present, emits an entry
  • For collections with auth: true, applies PAYLOAD_AUTH_PII_DEFAULTS then overlays custom.authPii overrides; emits entries for non-null defaults
  • Output structure: per-collection block listing fields + their PII metadata
  • Excluded fields (e.g., password: null) are documented in a separate excluded: section per collection for audit transparency

emit-retention-policy.mjs:

  • Walks every Payload collection
  • Emits custom.retention block per collection
  • Validates: every collection MUST have purgeSchedule declared (failure: print collection name + hint)
  • Output structure: per-collection retention block with purge cadence + activeRetention + postDeletion + coldArchive

emit-sub-processors.mjs:

  • Walks docs/library-decisions/*.md, parses frontmatter
  • Filters to traces with is-sub-processor: true
  • Emits each as a sub-processor entry with conditional fields (data-sent, region, dpa-signed, sccs-required, contact)
  • Also reads compliance/sub-processors.manual.yml (if exists) for pure-HTTP entries with no backing trace; merges into output with source: manual flag
  • Output structure: array of sub-processor records, sorted by name

emit-all.mjs:

  • Orchestrates the three; supports --check mode for all at once
  • Single failure exit code per failed generator

ADR-022 amendment — trace frontmatter discriminated union

Every library trace at docs/library-decisions/<date>-<pkg>.md MUST declare two boolean fields after the existing ADR-022 fields:

  • is-sub-processor: boolean (does this library send data to an external server it owns/operates?)
  • processes-pii: boolean (does this library process personal data inside the calling process?)

When is-sub-processor: true, the following fields become REQUIRED:

  • data-sent: string[] (references PiiCategory values)
  • region: "EU" | "EEA" | "US" | "UK" | "CH" | "OTHER"
  • dpa-signed: ISO-date | null (null = pending)
  • sccs-required: boolean
  • contact: string (email or URL)

When is-sub-processor: false, these fields MUST be absent. Validator enforces.

The /evaluate-library skill prompts the user for both binary fields during evaluation; when is-sub-processor: true, also prompts for the 5 conditional fields. Trace template in .claude/skills/evaluate-library/SKILL.md updated accordingly.

The weekly trace revalidation cron (ADR-023) checks dpa-signed for staleness: DPA dates older than 2 years trigger a re-confirmation issue.

Pre-commit hook integration

.husky/pre-commit gains a step that runs pnpm compliance:emit-all if and only if any staged file matches:

  • packages/*/src/integrations/cms/**/*.ts (Payload configs)
  • docs/library-decisions/*.md (library traces)
  • compliance/*.yml (the artifacts themselves — protects against manual edits)

Output is auto-staged via git add compliance/. The conditional check keeps unrelated commits fast (~10ms detection cost).

CI integration

.github/workflows/ci.yml's validate job gains a step:

- name: Compliance manifest drift check
  run: pnpm compliance:emit-all --check

Position: after pnpm conformance, before pnpm coverage:diff. Same severity (hard error). Failure message includes the fix command.

Background retention purge job

Lives at core-shared/payload/retention-purge/retention-purge.job.ts. Receives ctx.queue (IJobQueue from core-shared/jobs) + ctx.config (Payload SanitizedConfig).

At app boot, the binder walks every collection, reads custom.retention.purgeSchedule, registers a scheduled job per collection with the corresponding cadence. The job body:

  1. Queries the collection for rows whose activeRetention.duration has elapsed (from createdAt for from-creation, from updatedAt for from-last-access)
  2. For each row:
    • If postDeletion.action === "pseudonymize": NULL the PII fields, set processing_restricted: true (per Epic B's IProcessingRestriction)
    • If postDeletion.action === "hard-delete": cascade delete via Payload's delete operation
  3. Emits one audit entry per processed row: IAuditLog.record({ action: "DELETE", subject: row.id, actor: "system", reason: "retention-policy" })

When auditLog isn't wired (consumer hasn't scaffolded core-audit), the audit emission is skipped without throwing.

Backfill scope (per Q6)

Stories cover backfill of existing template collections:

  • auth.users: displayName tagged as identification-username (exportable, restrictable); role tagged as null (not PII per template default). Collection retention: activeRetention: indefinite, postDeletion: 30d hard-delete, purgeSchedule: daily. PAYLOAD_AUTH_PII_DEFAULTS covers email/password/salt/hash automatically; no custom.authPii override needed unless future custom auth fields are added.
  • blog.articles: collection retention only (no PII fields by default — author refs not modeled as PII per template).
  • marketing-pages.site-settings, pages: collection retention only (no PII).
  • media.media: collection retention; if uploadedBy exists, tag it as identification-username.
  • navigation.header: collection retention only.

Each backfill is one slice = one commit.

Conformance impact

  • ESLint rule count: 10 → 11 (adds pii-declaration-must-be-complete)
  • Manifest fields: unchanged (no per-use-case manifest fields added by Epic A — declarations are on Payload configs, not feature manifests)
  • New brand: none in Epic A (brands are Epic B/C territory)
  • Boot assertion: extended to validate custom.retention.purgeSchedule is parseable when the binder boots in production

Testing decisions

  • Type primitives: vitest tests on core-shared/payload/pii-types.ts and retention-types.ts — verify TS shape via @ts-expect-error on malformed declarations; verify defaults exports.
  • Generators: each script gets unit tests covering: happy path, --check matches, --check mismatch with readable diff, empty input (no collections), auth-managed defaults applied, custom.authPii overrides applied, sub-processor frontmatter discriminated union parsing.
  • ESLint rule: RuleTester-based fixture suite mirroring no-undeclared-audit.test.js. Cover: complete custom.pii passes, missing category fires, missing purpose fires, missing exportable fires, non-PII collection no-op, malformed YAML in trace no-op.
  • Retention purge job: unit test with in-memory Payload mock; verify schedule registration, row matching, audit emission, pseudonymize vs hard-delete branches, optional auditLog graceful skip.
  • Integration: e2e test using the existing dev-seed setup — declare a custom.retention on a test collection, run pnpm compliance:emit-retention-policy --check, assert output. Same for --check mismatch with a forced manual edit.
  • No repository contract suite — Epic A doesn't introduce a new IXRepository.
  • Coverage: Epic A's modules join the L0 vitest thresholds (per coverage.bands in the affected packages' manifests); L1 pnpm coverage:diff gates the slices in dispatch.
  • Prior art to mirror:
    • Generator + --check pattern: scripts/coverage/diff.mjs (similar diff-against-committed-output pattern)
    • ESLint rule shape: packages/core-eslint/rules/no-undeclared-audit.{js,test.js}
    • Background job in core-shared: existing packages/core-shared/src/jobs/payload-job-queue.{ts,test.ts}
    • Ambient TypeScript module augmentation: existing patterns in node_modules/@types/* (search for declare module)

Open questions

  • Q1: Should retention-must-be-declared ESLint rule join pii-declaration-must-be-complete? — Recommended: No, defer. Q6 already establishes that every existing template collection gets custom.retention; making it ESLint-enforced for all Payload collections in all downstream consumers is stricter than the strategy ADR. Add the rule in a follow-up PRD if a consumer feels the pain.
  • Q2: How does from-last-access retention trigger interact with Payload — does Payload track updatedAt natively? — Payload sets createdAt and updatedAt by default. from-last-access reads updatedAt. If a consumer needs true "last read" tracking (rare), they add a custom hook updating lastAccessedAt; the purge job uses that field via custom.retention.lastAccessFieldOverride (deferred — not in Epic A).
  • Q3: Should the generator emit compliance/manifest.lock.yml containing a hash of source declarations, for fast --check mode? — Recommended: No, defer. Full regenerate-and-diff is fast enough (~50ms for the template's 6 collections). Reconsider if generators grow slow on a real consumer with hundreds of collections.
  • Q4: How does the purge job handle race conditions between concurrent purge runs (e.g., misconfigured cron firing daily and weekly simultaneously)? — Recommended: per-job advisory lock via Payload's job system (existing IJobQueue should support this); test for it in the job's unit suite. If not supported, document as a known limitation in the consumer's runbook.
  • Q5: Does the pre-commit hook trigger on compliance/*.yml edits themselves (the auto-stage behavior could feedback-loop)? — Recommended: Yes, conditionally. The hook runs emit-all when compliance/*.yml is staged because that's the manual-edit case; the generator regenerates the YAML, auto-stages the regenerated version, replacing the developer's manual edit. The developer sees this in their git status post-commit and can re-edit if they had intent. Avoids drift via manual edit silently surviving.

Out of scope (deferred)

  • DSR scaffold and consent abstraction (Epic B)
  • Security headers + rate-limit + SBOM (Epic C)
  • Compliance fill-in docs (Epic D)
  • retention-must-be-declared ESLint rule (see Q1 above)
  • lastAccessedAt field hook + true "from-last-access" retention (see Q2)
  • compliance/manifest.lock.yml for faster --check (see Q3)
  • Backfill of any Payload auth: true collection beyond users (none exist yet)
  • Cross-region transfer documentation (DPIA / TIA) — Epic D's territory
  • Migration tooling for downstream consumers upgrading from a pre-ADR-025 version of the template (the template hasn't been versioned with consumers yet; no migration needed)

Further notes

  • Builds on: ADR-018 (audit channel — purge emits audit entries), ADR-022 (library evaluation policy — extended for sub-processor fields), ADR-023 (CI security + supply chain — pre-commit + CI integration follows the established pattern), ADR-025 (strategy umbrella).
  • Pairs with: Epic B PRD dsr-consent-and-cookie-banner.prd.md (consumes A's PII tags for DSR cascade); Epic D PRD compliance-docs-scaffolds.prd.md (references A's generator output formats in .example.yml files).
  • Sequencing: Epic A is the dependency-graph root for Epic B. Epic B's PRD will be authored once Epic A's stories are at least partially dispatched (decomposer needs A's type primitives to settle B's interfaces).
  • Stakeholders: template authors (most affected — three new generators, ESLint rule, ADR-022 amendment), downstream consumers (positively affected — gain three compliance artifacts), AI agents operating in feature code (positively affected — typed Payload custom config catches invalid declarations at compile time), compliance reviewers (positively affected — compliance/ directory becomes audit evidence).