decision
ADR-0014: Cross-Pollination Security Boundary — Three-Roles Model
ADR-0014 (Proposed (pending privacy counsel review by Allen), 2026-05-14): Cross-Pollination Security Boundary — Three-Roles Model.
Status: Proposed (pending privacy counsel review by Allen)
Date: 2026-05-14
Deciders: Seth (Lead Architect), Blair (CEO)
Context
MemberIntel aggregates data across MemberPress customer sites. The product depends on two data stores: a global brain (tenant_id=“global”) seeded with MemberPress KB content, and per-customer brains (tenant_id per customer) that will accumulate site-specific data from MP Connect starting in V1.5. The SPEC (SS6.3) defines cross-pollination as insights derived from one customer’s data that may be useful to another customer — the data flywheel that makes the product more valuable with every user.
The architecture deep-dive (arch-cross-pollination) identifies three failure modes: re-identification (implicit PII survives abstraction), tenant leakage (the pipeline itself crosses boundaries), and opt-out bypass (consent not enforced at the source). It proposes a three-roles model to contain the cross-pollination pipeline’s blast radius.
Today, V1’s search_brain() in src/memberintel/api/retrieval/search.py falls back from a per-user tenant_id to "global" when the user has no entries (lines 83-92). This fallback means every V1 user can reach global brain content, which is correct for V1 (the global brain is the only brain) but becomes a security boundary question once per-customer brains exist.
Privacy counsel (Allen, Blair’s lawyer) is scheduled to review this ADR before it moves to Accepted. The late-May architecture review (action item from the 2026-05-11 Blair x Seth working session) will cover it.
Decision
Three data classification tiers govern all brain data:
-
System-level data (tenant_id=“global”) — MemberPress KB content, curated playbooks, and any content authored by Caseproof staff. Available to all authenticated users. No customer-specific data. This is the only tier V1 uses.
-
Customer-specific data (tenant_id per customer) — per-customer brain entries, site-specific metrics, agent action logs, feedback signals. Never shared across tenants without explicit opt-in consent. Introduced in V1.5, isolated by tenant_id + Row-Level Security (RLS) per ADR-0003.
-
Aggregated/anonymized insights — patterns abstracted from customer-specific data via the cross-pollination pipeline (e.g., “60% of MemberPress sites use recurring payments”). Safe to share across tenants after content-lead review, k-anonymity checks, and opt-in consent. Not present in V1 or V1.5; deferred to V2+.
V1 scope: system-level data only. The global brain contains only MemberPress KB content seeded via the Hive Mind MCP pipeline. There is no cross-pollination in V1. No customer-specific data enters the global brain. The search_brain() global fallback (lines 83-92) is documented as a TODO to remove once per-customer brains are populated in V1.5.
V1.5 scope: per-customer brains with tenant_id isolation. V1.5 introduces per-customer brains, each scoped to a single tenant_id. RLS enforces current_setting('app.tenant_id')::uuid = tenant_id on every row read (ADR-0003). The search_brain() fallback is replaced with a dual search: the user’s own tenant first, then the global brain as a supplementary result set — never crossing into another customer’s tenant.
Cross-pollination is a V2+ feature, gated by four conditions:
- Privacy counsel review and sign-off — Allen reviews and approves the cross-pollination pipeline design, consent language, and k-anonymity thresholds before any cross-tenant data flow is built.
- Explicit opt-in from the source customer — a
cross_pollination_consentfield on the tenant record (default true per SPEC SS6.3, but with consent versioning so scope changes require re-consent). The pipeline’s first SQL excludes opted-out tenants before any data is read. - k-anonymity floor of N=5 — a candidate insight must be derivable from at least 5 customers’ data before it enters the drafting stage. Below 5, the pattern is identifying by definition and must not be promoted. N is tunable upward; below 3 is indefensible.
- Three-roles model for the pipeline — the cross-pollination service runs as its own Cloud Run Job with its own service account and database role (
cross_pollination_role). This role has read-only access to per-customer brain entries with positive feedback signals, and write access only to theglobal_brain_candidatesstaging table. It cannot read tenant records, member data, or transactions. The content lead reviews candidates via a separate role (content_lead_role) that can read the staging table and write approvals to the global brain. The API’s application role cannot see the staging table. Three roles, three responsibilities, no overlap.
The search_brain() global fallback is removed in V1.5. The current fallback (lines 83-92) exists because V1 has only the global brain. In V1.5, search returns the user’s own tenant results plus global-brain supplementary results. Falling back to another customer’s tenant_id is never correct.
Consequences
Positive:
- Clear security boundary documented before any cross-tenant data flow exists — privacy counsel reviews a design, not a retrofit.
- V1 and V1.5 ship without cross-pollination risk — the global brain contains only authored/seeded content, never customer data.
- The three-roles model means
grep cross_pollination_rolefinds every line of code that touches the pipeline; the boundary is auditable. - Consent versioning (SS6.3 requirement) ensures scope changes require re-consent — no purpose-creep without customer awareness.
- k-anonymity floor prevents singleton patterns from being promoted, closing the re-identification failure mode.
Negative / costs:
- V1 customers see only global brain content — no cross-pollinated insights, thinner product value until enough customers generate per-customer data.
- The V1.5 dual-search (tenant + global) adds query complexity compared to the single-tenant fallback.
- Cross-pollination deferral to V2 means the data flywheel starts late — content-lead curation of the global brain is the only growth mechanism in V1.
- The three-roles model requires a separate Cloud Run Job, service account, and database role — infrastructure cost and operational complexity that only pays off when cross-pollination ships.
Mitigations:
- Content lead seeds 50+ playbooks at launch (SPEC requirement), giving the global brain enough value to retain early users until cross-pollination activates.
- The V1.5 dual-search is a small extension of existing
search_brain()logic — tenant filter first, global filter second, merge and deduplicate. - The
search_brain()TODO at line 83 is tracked for removal in V1.5; the fallback is explicitly not a precedent for cross-tenant access.
Alternatives considered
- Cross-pollination in V1 — rejected: V1 has no per-customer brains to cross-pollinate from. The global brain is seeded content only. Building the pipeline before there’s source data is premature infrastructure.
- Global fallback persists in V1.5 — rejected: falling back from a user’s empty tenant to “global” is fine in V1 (global is the only tenant). In V1.5, falling back to another customer’s tenant_id would be a cross-tenant leak. The fallback must be replaced with explicit dual-search.
- Two-roles model (pipeline role + API role) — rejected: the architecture deep-dive identifies this as the “pipeline itself leaks across tenants” failure mode. If the pipeline role can both read customer data and write to the global brain, a misconfigured query or prompt injection could write raw customer data into the global brain in a single transaction. The staging table and content-lead review role prevent this — candidates are never in the global brain until a human approves them.
- Differential privacy instead of k-anonymity — considered but deferred: differential privacy adds mathematical guarantees but requires calibrated noise injection and a privacy budget, which is complexity that doesn’t justify itself at V1 scale. k-anonymity at N=5 plus content-lead review is sufficient for V2 launch. If scale or regulatory requirements change, this ADR is updated.