Implementation seed for ADR-025 Epic A: declarative PII inventory + retention + sub-processor manifests with three generators, pre-commit + CI drift detection, background purge job, and ADR-022 trace frontmatter extension for sub-processor fields (discriminated union). Eleven user stories ordered for the decomposer; five open questions with recommendations. Status: approved — ready for pnpm work decompose.
26 KiB
id, title, type, status, author, created, updated
| id | title | type | status | author | created | updated |
|---|---|---|---|---|---|---|
| compliance-manifests-pii-retention-subprocessors | Declarative compliance manifests (PII + retention + sub-processors) — Epic A of ADR-025 | prd | approved | danijel | 2026-05-18T17:52:09Z | 2026-05-18T17:55:38.523Z |
Problem
A consumer adopting this template today gets the audit channel (ADR-018), observability PII boundary (ADR-017 §7), EU library residency filter (ADR-022), and supply-chain hardening (ADR-023) — about half the playbook's surface. But three load-bearing compliance artifacts that a DPA auditor (or a partner doing GDPR due diligence) expects are missing:
- Data map — no structured inventory of "what personal data does this system hold, where, with what retention, exportable by whom"
- Retention policy — no declarative source of truth for how long data lives per collection, no scheduled purge mechanism
- Sub-processor inventory — no record of "which third-party services receive personal data, under what DPA, in what region"
Every downstream EU-bound consumer currently has to invent these three artifacts from scratch, and the cost of drift between code (what's actually PII) and documentation (what we say is PII) is high enough that most teams ship with both stale.
ADR-025 settled the strategy: tag at the Payload collection/field level + generators emit compliance/*.yml + extend ADR-022 traces for sub-processors. This PRD is the implementation seed for Epic A.
Goal
Ship the declarative compliance manifests + generators so downstream consumers get a complete, automatically-validated PII inventory, retention policy, and sub-processor record by editing source-of-truth Payload configs and ADR-022 library traces. Drift detection runs in pre-commit + CI; the consumer's compliance/ directory becomes audit evidence.
In scope
- Type primitives in
core-shared/payload/:PiiCategory,DataProcessingPurpose,RetentionAction,RetentionTrigger,FieldPii,CollectionRetention,AuthPiiDefaults - TypeScript ambient module declaration extending Payload's
custom: Record<string, unknown>to typepii(per field) andretention(per collection) - Three generators under
scripts/compliance/:emit-data-map.mjs→compliance/data-map.ymlemit-retention-policy.mjs→compliance/retention-policy.ymlemit-sub-processors.mjs→compliance/sub-processors.yml
- Orchestrator
emit-all.mjs+pnpm compliance:*package scripts --checkmode on every generator for drift detection- Pre-commit hook integration: conditional (runs only when staged files match Payload configs or library traces) auto-regenerate + auto-stage
- CI integration:
pnpm compliance:emit-all --checkstep inci.yml's validate job, hard-fail on drift - New conformance ESLint rule
pii-declaration-must-be-complete(warn): flagscustom.pii: {...}blocks missing required sub-fields - ADR-022 amendment: trace frontmatter gains discriminated-union sub-processor fields (
is-sub-processor,processes-pii, conditionaldata-sent/region/dpa-signed/sccs-required/contact) /evaluate-libraryskill update: prompts for the new fields during trace authoring- Background purge job in
core-shared/payload/retention-purge/using existingIJobQueueinfrastructure; emits an audit entry per row purged - Backfill of existing template collections per Q6 of the Epic A grill:
auth.users: full PII tagging ondisplayName+custom.retention+custom.authPiioverrides (if non-default)- All 6 existing collections:
custom.retentiondeclared - Other collections: PII tags only where unambiguous (e.g.,
media.media.uploadedByif tracked)
docs/compliance/reference files:data-map.example.yml,retention-policy.example.yml,sub-processors.example.yml,README.mdexplaining thedocs/compliance/(templates) vs rootcompliance/(live artifacts) split
Out of scope
- DSR scaffold (
@repo/core-dsr,IDataExport/IDataDelete/IDataRectify/IProcessingRestriction) — Epic B - Consent abstraction (
@repo/core-consent,IConsent,ConsentCheckedbrand,requiresConsentmanifest field) — Epic B - Cookie consent UI component — Epic B
- Security headers middleware — Epic C
- Rate-limit primitive (
IRateLimit,RateLimitedbrand) — Epic C - SBOM generation in CI — Epic C
- Compliance fill-in docs (runbooks, policies, pre-launch checklist) — Epic D
- Per-feature PII migration beyond what Q6 specifies — consumers ship their own
- Pure-HTTP sub-processors with no library trace — allowed as hand-authored
compliance/sub-processors.ymlentries, generator handles "manual entry, no trace" with a CI-visible flag, but no scaffold for editing them - Retention enforcement for non-Payload data stores (Redis, S3, log aggregators) — out of scope; the template doesn't ship those abstractions yet
- Cross-region transfer assessment (Schrems II / TIA artifacts) — partially addressed via
regionfield in sub-processor records, but DPIA-style transfer-impact-assessment docs are Epic D's territory
Constraints
- ADR-025 — Epic A's strategy is settled there. Implementation may surface details ADR-025 didn't anticipate; flag those for amendment before proceeding.
- ADR-022 — sub-processor trace fields extend ADR-022's frontmatter. The discriminated-union shape and the
/evaluate-libraryskill prompts are amendments captured in this PRD. - ADR-018 — purge job emits
IAuditLog.record({ action: "DELETE", reason: "retention-policy" })per row purged. Uses the existingcore-auditaudit channel; doesn't introduce new audit semantics. - ADR-023 — pre-commit + CI integration follows the existing pattern (
.husky/pre-commitalready runsbump-updated-timestamps.mjs; ci.yml already has avalidatejob). - Manifest-first ordering — the PII type primitives + Payload ambient declaration are the "manifest" for this work; they land first.
- Generator-first — Payload collections do NOT get hand-rolled scaffolding; existing collection files are modified in place per Q6.
core-sharedmust-have boundary — purge job lives incore-shared/payload/because every template consumer uses Payload. Doesn't create a new optional core (per Q5).- No
--no-verify— pre-commit hook auto-regenerates compliance YAMLs; developers cannot bypass with--no-verifyper repo policy. CI re-checks anyway. - Conventional Commits — every slice lands as one green commit per the established session convention.
Success criteria
pnpm compliance:emit-allproduces three deterministic YAMLs atcompliance/data-map.yml,compliance/retention-policy.yml,compliance/sub-processors.yml.pnpm compliance:emit-all --checkexits 0 when committed YAMLs match source declarations; exits non-zero with a readable diff otherwise.- Pre-commit hook auto-regenerates conditionally — staging only a non-Payload-config / non-trace file does NOT trigger the generators.
- CI workflow (
ci.yml's validate job) blocks merges with mismatched compliance YAMLs. pii-declaration-must-be-completeESLint rule fires on acustom.pii: { category: "contact-email" }(missing required sub-fields) in a synthetic Payload collection fixture.auth.usershas a completecustom.piitag set:displayNametagged,emailcovered byPAYLOAD_AUTH_PII_DEFAULTS,password/salt/hashexcluded by the same default.- All 6 existing template collections declare
custom.retention. - A library trace authored via
/evaluate-librarywithis-sub-processor: truetriggers the conditional fields prompt; the trace fails validation if any are missing. - The retention purge job, when run via
pnpm work dispatchor via localpnpm devboot, schedules per-collection deletes; deleted rows produce audit entries withaction: "DELETE"andreason: "retention-policy". pnpm typecheck && pnpm lint && pnpm test && pnpm conformance && pnpm fallow:audit && pnpm compliance:emit-all --checkall pass green at every commit boundary.
User stories
- As a template author, I want declarative PII + retention + sub-processor inventories so downstream consumers get audit evidence by editing source-of-truth configs.
- As a downstream consumer, I want to add
custom.pii: { category: "contact-email", ... }to a Payload field and have it appear incompliance/data-map.ymlafterpnpm compliance:emit-all. - As a downstream consumer, I want collection-level retention with cron-schedulable purge so old data gets deleted automatically without my writing a custom cron job.
- As a downstream consumer running a serverless deployment, I want
purgeScheduleto be runnable by either a process-local scheduler or an external cron — the interface accepts both. - As a downstream consumer, I want a CI gate that fails my PR if I add a new Payload field with
pii: truebut forget to regeneratecompliance/data-map.ymlso my audit evidence stays in sync. - As a downstream consumer, I want
is-sub-processor: truetraces to drivecompliance/sub-processors.ymlautomatically so I don't maintain a separate inventory. - As an AI agent scaffolding a new Payload collection, I want the TypeScript types for
custom.pii/custom.retentionto be enforced by the compiler so I can't ship an invalid declaration. - As an AI agent evaluating a new library via
/evaluate-library, I want the skill to ask "is this a sub-processor?" upfront so I author a complete trace in one pass. - As a compliance reviewer, I want
compliance/sub-processors.ymlto match DPA Section D so a regulator review surfaces zero discrepancies. - As a template author, I want the existing
auth.userscollection backfilled with PII tags so the template has a working reference example (not just empty schema). - As a template author, I want
password/salt/hash/resetPasswordTokenexcluded from the data map by default so downstream consumers can't accidentally ship them as "exportable user data."
Implementation decisions
Module surface
@repo/core-sharedmodifications (must-have package):- New module
core-shared/payload/pii-types.tsexportingPiiCategory,DataProcessingPurpose,RetentionAction,RetentionTrigger,FieldPii,AuthPiiDefaults,PAYLOAD_AUTH_PII_DEFAULTS - New module
core-shared/payload/retention-types.tsexportingCollectionRetention,ISO8601Duration(helper) - New module
core-shared/payload/retention-purge/containingretention-purge.job.ts+ unit test - Ambient TypeScript declaration extending Payload's
FieldandCollectionConfigcustom?: {}to typepii?: FieldPiiandretention?: CollectionRetentionrespectively
- New module
- No new optional core packages — Epic A doesn't introduce
core-retentionor similar (per Q5) @repo/core-eslint: new rulepii-declaration-must-be-complete(warn). Extends_manifest-ast.jswith a Payload-collection field-parserscripts/compliance/: 4 new mjs scripts (3 emitters + 1 orchestrator) following the establishedscripts/<topic>/pattern- Existing feature packages (auth, blog, media, marketing-pages, navigation): Payload collection files modified in place to add
custom.retention(all) andcustom.pii(auth fully, others sparingly per Q6) .claude/skills/evaluate-library/SKILL.md: updated with the two new prompt questions and the discriminated-union trace template.husky/pre-commit: new conditional step forpnpm compliance:emit-all.github/workflows/ci.yml: new steppnpm compliance:emit-all --checkinvalidatejobpackage.json(root): new scriptscompliance:data-map,compliance:retention-policy,compliance:sub-processors,compliance:emit-all
Type primitive contracts (decision-encoding inlined)
// core-shared/payload/pii-types.ts
export type PiiCategory =
| "contact-email"
| "contact-phone"
| "contact-address"
| "identification-name"
| "identification-username"
| "identification-government-id"
| "auth-credential"
| "auth-token"
| "network-ip"
| "network-user-agent"
| "financial-info"
| "behavioral-engagement"
| "document-content"
| "derived-metric"
| (string & Record<never, never>); // declaration-merge escape hatch for consumer extension
export type DataProcessingPurpose =
| "account-authentication"
| "transactional-notifications"
| "marketing-communications"
| "analytics-aggregation"
| "legal-compliance"
| "service-delivery"
| (string & Record<never, never>);
export type RetentionTrigger =
| "from-creation"
| "from-last-access"
| "after-deletion";
export type RetentionAction = "hard-delete" | "pseudonymize";
export type FieldRetention = {
duration: string; // ISO 8601 duration, e.g. "P30D"
trigger: RetentionTrigger;
action: RetentionAction;
};
export type FieldPii = {
category: PiiCategory;
purpose: DataProcessingPurpose[];
retention?: FieldRetention; // optional; falls back to collection-level when omitted
exportable: boolean;
restrictable: boolean;
};
// PAYLOAD_AUTH_PII_DEFAULTS — applied automatically when `auth: true`
// `null` = excluded from data-map (security material, never PII-export)
// Consumer overrides via `custom.authPii: { email: { ...override }, totpSecret: null }`
export const PAYLOAD_AUTH_PII_DEFAULTS: Record<string, FieldPii | null> = {
email: {
category: "contact-email",
purpose: ["account-authentication", "transactional-notifications"],
exportable: true,
restrictable: true,
},
password: null,
salt: null,
hash: null,
resetPasswordToken: null,
resetPasswordExpiration: null,
loginAttempts: null,
lockUntil: null,
apiKey: null,
apiKeyIndex: null,
};
// core-shared/payload/retention-types.ts
export type PurgeSchedule = "daily" | "weekly" | "monthly" | string; // cron expression
export type CollectionRetention = {
activeRetention?: {
duration: string;
trigger: "from-creation" | "from-last-access";
};
postDeletion?: {
duration: string;
trigger: "after-deletion";
action: RetentionAction;
};
purgeSchedule: PurgeSchedule;
coldArchive?: { duration: string; trigger: "from-creation" };
};
Ambient module declaration:
// core-shared/payload/payload-custom-ambient.d.ts
declare module "payload" {
interface Field {
custom?: {
pii?: FieldPii;
[key: string]: unknown;
};
}
interface CollectionConfig {
custom?: {
retention?: CollectionRetention;
authPii?: Record<string, FieldPii | null>;
[key: string]: unknown;
};
}
}
Generator contracts
Each generator runs in one of three modes:
- Default: emit YAML to
compliance/<artifact>.yml, overwriting --check: regenerate in-memory, diff against existing file, exit 0 on match, non-zero with diff on mismatch--print: emit to stdout (for debugging)
YAML output is deterministic (sorted keys, normalized formatting, trailing newline) so byte-identical runs produce byte-identical output.
emit-data-map.mjs:
- Walks every Payload collection across packages (uses
@repo/cmsto load all configs) - For each field in
fields[], readscustom.piiif present, emits an entry - For collections with
auth: true, appliesPAYLOAD_AUTH_PII_DEFAULTSthen overlayscustom.authPiioverrides; emits entries for non-null defaults - Output structure: per-collection block listing fields + their PII metadata
- Excluded fields (e.g.,
password: null) are documented in a separateexcluded:section per collection for audit transparency
emit-retention-policy.mjs:
- Walks every Payload collection
- Emits
custom.retentionblock per collection - Validates: every collection MUST have
purgeScheduledeclared (failure: print collection name + hint) - Output structure: per-collection retention block with purge cadence + activeRetention + postDeletion + coldArchive
emit-sub-processors.mjs:
- Walks
docs/library-decisions/*.md, parses frontmatter - Filters to traces with
is-sub-processor: true - Emits each as a sub-processor entry with conditional fields (
data-sent,region,dpa-signed,sccs-required,contact) - Also reads
compliance/sub-processors.manual.yml(if exists) for pure-HTTP entries with no backing trace; merges into output withsource: manualflag - Output structure: array of sub-processor records, sorted by name
emit-all.mjs:
- Orchestrates the three; supports
--checkmode for all at once - Single failure exit code per failed generator
ADR-022 amendment — trace frontmatter discriminated union
Every library trace at docs/library-decisions/<date>-<pkg>.md MUST declare two boolean fields after the existing ADR-022 fields:
is-sub-processor: boolean(does this library send data to an external server it owns/operates?)processes-pii: boolean(does this library process personal data inside the calling process?)
When is-sub-processor: true, the following fields become REQUIRED:
data-sent: string[](referencesPiiCategoryvalues)region: "EU" | "EEA" | "US" | "UK" | "CH" | "OTHER"dpa-signed: ISO-date | null(null = pending)sccs-required: booleancontact: string(email or URL)
When is-sub-processor: false, these fields MUST be absent. Validator enforces.
The /evaluate-library skill prompts the user for both binary fields during evaluation; when is-sub-processor: true, also prompts for the 5 conditional fields. Trace template in .claude/skills/evaluate-library/SKILL.md updated accordingly.
The weekly trace revalidation cron (ADR-023) checks dpa-signed for staleness: DPA dates older than 2 years trigger a re-confirmation issue.
Pre-commit hook integration
.husky/pre-commit gains a step that runs pnpm compliance:emit-all if and only if any staged file matches:
packages/*/src/integrations/cms/**/*.ts(Payload configs)docs/library-decisions/*.md(library traces)compliance/*.yml(the artifacts themselves — protects against manual edits)
Output is auto-staged via git add compliance/. The conditional check keeps unrelated commits fast (~10ms detection cost).
CI integration
.github/workflows/ci.yml's validate job gains a step:
- name: Compliance manifest drift check
run: pnpm compliance:emit-all --check
Position: after pnpm conformance, before pnpm coverage:diff. Same severity (hard error). Failure message includes the fix command.
Background retention purge job
Lives at core-shared/payload/retention-purge/retention-purge.job.ts. Receives ctx.queue (IJobQueue from core-shared/jobs) + ctx.config (Payload SanitizedConfig).
At app boot, the binder walks every collection, reads custom.retention.purgeSchedule, registers a scheduled job per collection with the corresponding cadence. The job body:
- Queries the collection for rows whose
activeRetention.durationhas elapsed (fromcreatedAtforfrom-creation, fromupdatedAtforfrom-last-access) - For each row:
- If
postDeletion.action === "pseudonymize": NULL the PII fields, setprocessing_restricted: true(per Epic B's IProcessingRestriction) - If
postDeletion.action === "hard-delete": cascade delete via Payload's delete operation
- If
- Emits one audit entry per processed row:
IAuditLog.record({ action: "DELETE", subject: row.id, actor: "system", reason: "retention-policy" })
When auditLog isn't wired (consumer hasn't scaffolded core-audit), the audit emission is skipped without throwing.
Backfill scope (per Q6)
Stories cover backfill of existing template collections:
auth.users:displayNametagged asidentification-username(exportable, restrictable);roletagged as null (not PII per template default). Collection retention:activeRetention: indefinite, postDeletion: 30d hard-delete, purgeSchedule: daily.PAYLOAD_AUTH_PII_DEFAULTScoversemail/password/salt/hashautomatically; nocustom.authPiioverride needed unless future custom auth fields are added.blog.articles: collection retention only (no PII fields by default — author refs not modeled as PII per template).marketing-pages.site-settings,pages: collection retention only (no PII).media.media: collection retention; ifuploadedByexists, tag it asidentification-username.navigation.header: collection retention only.
Each backfill is one slice = one commit.
Conformance impact
- ESLint rule count: 10 → 11 (adds
pii-declaration-must-be-complete) - Manifest fields: unchanged (no per-use-case manifest fields added by Epic A — declarations are on Payload configs, not feature manifests)
- New brand: none in Epic A (brands are Epic B/C territory)
- Boot assertion: extended to validate
custom.retention.purgeScheduleis parseable when the binder boots in production
Testing decisions
- Type primitives: vitest tests on
core-shared/payload/pii-types.tsandretention-types.ts— verify TS shape via@ts-expect-erroron malformed declarations; verify defaults exports. - Generators: each script gets unit tests covering: happy path,
--checkmatches,--checkmismatch with readable diff, empty input (no collections), auth-managed defaults applied,custom.authPiioverrides applied, sub-processor frontmatter discriminated union parsing. - ESLint rule: RuleTester-based fixture suite mirroring
no-undeclared-audit.test.js. Cover: completecustom.piipasses, missingcategoryfires, missingpurposefires, missingexportablefires, non-PII collection no-op, malformed YAML in trace no-op. - Retention purge job: unit test with in-memory Payload mock; verify schedule registration, row matching, audit emission, pseudonymize vs hard-delete branches, optional
auditLoggraceful skip. - Integration: e2e test using the existing dev-seed setup — declare a
custom.retentionon a test collection, runpnpm compliance:emit-retention-policy --check, assert output. Same for--checkmismatch with a forced manual edit. - No repository contract suite — Epic A doesn't introduce a new
IXRepository. - Coverage: Epic A's modules join the L0 vitest thresholds (per
coverage.bandsin the affected packages' manifests); L1pnpm coverage:diffgates the slices in dispatch. - Prior art to mirror:
- Generator +
--checkpattern:scripts/coverage/diff.mjs(similar diff-against-committed-output pattern) - ESLint rule shape:
packages/core-eslint/rules/no-undeclared-audit.{js,test.js} - Background job in
core-shared: existingpackages/core-shared/src/jobs/payload-job-queue.{ts,test.ts} - Ambient TypeScript module augmentation: existing patterns in
node_modules/@types/*(search fordeclare module)
- Generator +
Open questions
- Q1: Should
retention-must-be-declaredESLint rule joinpii-declaration-must-be-complete? — Recommended: No, defer. Q6 already establishes that every existing template collection getscustom.retention; making it ESLint-enforced for all Payload collections in all downstream consumers is stricter than the strategy ADR. Add the rule in a follow-up PRD if a consumer feels the pain. - Q2: How does
from-last-accessretention trigger interact with Payload — does Payload trackupdatedAtnatively? — Payload setscreatedAtandupdatedAtby default.from-last-accessreadsupdatedAt. If a consumer needs true "last read" tracking (rare), they add a custom hook updatinglastAccessedAt; the purge job uses that field viacustom.retention.lastAccessFieldOverride(deferred — not in Epic A). - Q3: Should the generator emit
compliance/manifest.lock.ymlcontaining a hash of source declarations, for fast--checkmode? — Recommended: No, defer. Full regenerate-and-diff is fast enough (~50ms for the template's 6 collections). Reconsider if generators grow slow on a real consumer with hundreds of collections. - Q4: How does the purge job handle race conditions between concurrent purge runs (e.g., misconfigured cron firing daily and weekly simultaneously)? — Recommended: per-job advisory lock via Payload's job system (existing
IJobQueueshould support this); test for it in the job's unit suite. If not supported, document as a known limitation in the consumer's runbook. - Q5: Does the pre-commit hook trigger on
compliance/*.ymledits themselves (the auto-stage behavior could feedback-loop)? — Recommended: Yes, conditionally. The hook runsemit-allwhencompliance/*.ymlis staged because that's the manual-edit case; the generator regenerates the YAML, auto-stages the regenerated version, replacing the developer's manual edit. The developer sees this in theirgit statuspost-commit and can re-edit if they had intent. Avoids drift via manual edit silently surviving.
Out of scope (deferred)
- DSR scaffold and consent abstraction (Epic B)
- Security headers + rate-limit + SBOM (Epic C)
- Compliance fill-in docs (Epic D)
retention-must-be-declaredESLint rule (see Q1 above)lastAccessedAtfield hook + true "from-last-access" retention (see Q2)compliance/manifest.lock.ymlfor faster--check(see Q3)- Backfill of any Payload
auth: truecollection beyondusers(none exist yet) - Cross-region transfer documentation (DPIA / TIA) — Epic D's territory
- Migration tooling for downstream consumers upgrading from a pre-ADR-025 version of the template (the template hasn't been versioned with consumers yet; no migration needed)
Further notes
- Builds on: ADR-018 (audit channel — purge emits audit entries), ADR-022 (library evaluation policy — extended for sub-processor fields), ADR-023 (CI security + supply chain — pre-commit + CI integration follows the established pattern), ADR-025 (strategy umbrella).
- Pairs with: Epic B PRD
dsr-consent-and-cookie-banner.prd.md(consumes A's PII tags for DSR cascade); Epic D PRDcompliance-docs-scaffolds.prd.md(references A's generator output formats in.example.ymlfiles). - Sequencing: Epic A is the dependency-graph root for Epic B. Epic B's PRD will be authored once Epic A's stories are at least partially dispatched (decomposer needs A's type primitives to settle B's interfaces).
- Stakeholders: template authors (most affected — three new generators, ESLint rule, ADR-022 amendment), downstream consumers (positively affected — gain three compliance artifacts), AI agents operating in feature code (positively affected — typed Payload custom config catches invalid declarations at compile time), compliance reviewers (positively affected —
compliance/directory becomes audit evidence).