decision

ADR-0014: Cross-Pollination Security Boundary — Three-Roles Model

ADR-0014 (Proposed (pending privacy counsel review by Allen), 2026-05-14): Cross-Pollination Security Boundary — Three-Roles Model.

▶ Watch the 1:22 summary — ADR-0014 — Cross-Pollination Security Boundary — Three-Roles Model, explained

Status: Proposed (pending privacy counsel review by Allen)
Date: 2026-05-14
Deciders: Seth (Lead Architect), Blair (CEO)

Context

MemberIntel aggregates data across MemberPress customer sites. The product depends on two data stores: a global brain (tenant_id=“global”) seeded with MemberPress KB content, and per-customer brains (tenant_id per customer) that will accumulate site-specific data from MP Connect starting in V1.5. The SPEC (SS6.3) defines cross-pollination as insights derived from one customer’s data that may be useful to another customer — the data flywheel that makes the product more valuable with every user.

The architecture deep-dive (arch-cross-pollination) identifies three failure modes: re-identification (implicit PII survives abstraction), tenant leakage (the pipeline itself crosses boundaries), and opt-out bypass (consent not enforced at the source). It proposes a three-roles model to contain the cross-pollination pipeline’s blast radius.

Today, V1’s search_brain() in src/memberintel/api/retrieval/search.py falls back from a per-user tenant_id to "global" when the user has no entries (lines 83-92). This fallback means every V1 user can reach global brain content, which is correct for V1 (the global brain is the only brain) but becomes a security boundary question once per-customer brains exist.

Privacy counsel (Allen, Blair’s lawyer) is scheduled to review this ADR before it moves to Accepted. The late-May architecture review (action item from the 2026-05-11 Blair x Seth working session) will cover it.

Decision

Three data classification tiers govern all brain data:

System-level data (tenant_id=“global”) — MemberPress KB content, curated playbooks, and any content authored by Caseproof staff. Available to all authenticated users. No customer-specific data. This is the only tier V1 uses.
Customer-specific data (tenant_id per customer) — per-customer brain entries, site-specific metrics, agent action logs, feedback signals. Never shared across tenants without explicit opt-in consent. Introduced in V1.5, isolated by tenant_id + Row-Level Security (RLS) per ADR-0003.
Aggregated/anonymized insights — patterns abstracted from customer-specific data via the cross-pollination pipeline (e.g., “60% of MemberPress sites use recurring payments”). Safe to share across tenants after content-lead review, k-anonymity checks, and opt-in consent. Not present in V1 or V1.5; deferred to V2+.

V1 scope: system-level data only. The global brain contains only MemberPress KB content seeded via the Hive Mind MCP pipeline. There is no cross-pollination in V1. No customer-specific data enters the global brain. The search_brain() global fallback (lines 83-92) is documented as a TODO to remove once per-customer brains are populated in V1.5.

V1.5 scope: per-customer brains with tenant_id isolation. V1.5 introduces per-customer brains, each scoped to a single tenant_id. RLS enforces current_setting('app.tenant_id')::uuid = tenant_id on every row read (ADR-0003). The search_brain() fallback is replaced with a dual search: the user’s own tenant first, then the global brain as a supplementary result set — never crossing into another customer’s tenant.

Cross-pollination is a V2+ feature, gated by four conditions:

Privacy counsel review and sign-off — Allen reviews and approves the cross-pollination pipeline design, consent language, and k-anonymity thresholds before any cross-tenant data flow is built.
Explicit opt-in from the source customer — a cross_pollination_consent field on the tenant record (default true per SPEC SS6.3, but with consent versioning so scope changes require re-consent). The pipeline’s first SQL excludes opted-out tenants before any data is read.
k-anonymity floor of N=5 — a candidate insight must be derivable from at least 5 customers’ data before it enters the drafting stage. Below 5, the pattern is identifying by definition and must not be promoted. N is tunable upward; below 3 is indefensible.
Three-roles model for the pipeline — the cross-pollination service runs as its own Cloud Run Job with its own service account and database role (cross_pollination_role). This role has read-only access to per-customer brain entries with positive feedback signals, and write access only to the global_brain_candidates staging table. It cannot read tenant records, member data, or transactions. The content lead reviews candidates via a separate role (content_lead_role) that can read the staging table and write approvals to the global brain. The API’s application role cannot see the staging table. Three roles, three responsibilities, no overlap.

The search_brain() global fallback is removed in V1.5. The current fallback (lines 83-92) exists because V1 has only the global brain. In V1.5, search returns the user’s own tenant results plus global-brain supplementary results. Falling back to another customer’s tenant_id is never correct.

Consequences

Positive:

Clear security boundary documented before any cross-tenant data flow exists — privacy counsel reviews a design, not a retrofit.
V1 and V1.5 ship without cross-pollination risk — the global brain contains only authored/seeded content, never customer data.
The three-roles model means grep cross_pollination_role finds every line of code that touches the pipeline; the boundary is auditable.
Consent versioning (SS6.3 requirement) ensures scope changes require re-consent — no purpose-creep without customer awareness.
k-anonymity floor prevents singleton patterns from being promoted, closing the re-identification failure mode.

Negative / costs:

V1 customers see only global brain content — no cross-pollinated insights, thinner product value until enough customers generate per-customer data.
The V1.5 dual-search (tenant + global) adds query complexity compared to the single-tenant fallback.
Cross-pollination deferral to V2 means the data flywheel starts late — content-lead curation of the global brain is the only growth mechanism in V1.
The three-roles model requires a separate Cloud Run Job, service account, and database role — infrastructure cost and operational complexity that only pays off when cross-pollination ships.

Mitigations:

Content lead seeds 50+ playbooks at launch (SPEC requirement), giving the global brain enough value to retain early users until cross-pollination activates.
The V1.5 dual-search is a small extension of existing search_brain() logic — tenant filter first, global filter second, merge and deduplicate.
The search_brain() TODO at line 83 is tracked for removal in V1.5; the fallback is explicitly not a precedent for cross-tenant access.

Alternatives considered

Cross-pollination in V1 — rejected: V1 has no per-customer brains to cross-pollinate from. The global brain is seeded content only. Building the pipeline before there’s source data is premature infrastructure.
Global fallback persists in V1.5 — rejected: falling back from a user’s empty tenant to “global” is fine in V1 (global is the only tenant). In V1.5, falling back to another customer’s tenant_id would be a cross-tenant leak. The fallback must be replaced with explicit dual-search.
Two-roles model (pipeline role + API role) — rejected: the architecture deep-dive identifies this as the “pipeline itself leaks across tenants” failure mode. If the pipeline role can both read customer data and write to the global brain, a misconfigured query or prompt injection could write raw customer data into the global brain in a single transaction. The staging table and content-lead review role prevent this — candidates are never in the global brain until a human approves them.
Differential privacy instead of k-anonymity — considered but deferred: differential privacy adds mathematical guarantees but requires calibrated noise injection and a privacy budget, which is complexity that doesn’t justify itself at V1 scale. k-anonymity at N=5 plus content-lead review is sufficient for V2 launch. If scale or regulatory requirements change, this ADR is updated.

For: S Seth Shoultes A AI Engineer B Blair Williams S Santiago Perez Asis P Product Lead