decision

ADR-0006: Hive Mind as Brain Seed Source

ADR-0006 (Accepted, 2026-05-12): Hive Mind as Brain Seed Source.

▶ Watch the 1:11 summary — ADR-0006 — Hive Mind as Brain Seed Source, explained

Status: Accepted
Date: 2026-05-12
Deciders: Seth (Lead Architect)

Context

Slice 1 needs a global brain — a vector store of MemberPress product knowledge that the chat endpoint can cite. Building this from scratch would require curating, scraping, and embedding 45K+ docs, 70K+ code chunks, and 13K+ graph entities across 11 Caseproof products.

The Hive Mind (~/Local Sites/caseproof-agent/deploy/hive-mind-mcp/) already has all of this content indexed and accessible via an MCP endpoint (hive-mind.caseproofagent.com/mcp). It serves 48 MCP tools for querying Caseproof product knowledge.

The decision is whether to:

Build MemberIntel’s brain from scratch (curate + scrape + embed)
Seed from Hive Mind’s existing content (copy + re-embed with our own model)
Query Hive Mind at chat-time (no local brain, real-time API calls)

Decision

Seed from Hive Mind with re-embedding. MemberIntel will:

Call Hive Mind’s hive_format_context MCP tool to extract content
Re-embed all extracted content with Voyage voyage-3-lite (512d) into MemberIntel’s own pgvector brain_entries table
Run a nightly sync job to catch Hive Mind updates

This gives us:

Immediate content coverage (45K+ docs, 70K+ code chunks, 13K+ graph entities)
Ownership of our own embedding model (not locked to Hive Mind’s choice)
Low-latency search (pgvector cosine similarity, no network hop at query time)
Independence from Hive Mind’s uptime

Consequences

Positive:

Zero cold-start — the brain is populated on day one with production-quality content
MemberIntel controls its own embedding model; can swap Voyage for a better model later
Search latency is database-level (pgvector cosine similarity), not API-level
Nightly sync catches new Hive Mind content

Negative / costs:

Initial seed is a batch job that takes minutes to hours depending on content volume
Nightly sync adds operational complexity (needs a cron/scheduler)
Embedding API costs (Voyage) for the initial seed and nightly re-embeds
Content freshness is at most 24 hours behind Hive Mind

Mitigations:

The seed script (scripts/seed_brain.py) is idempotent — safe to re-run
Nightly sync uses incremental diffing (only embed new/changed content)
Voyage voyage-3-lite is cheap ($0.02/1M tokens); full re-embed is under $5
The tenant_id field on brain_entries keeps per-customer isolation for V2

Alternatives considered

Build from scratch — rejected: months of curation, no day-one content, duplicates Hive Mind’s work
Query Hive Mind at chat-time — rejected: adds 200-500ms latency per query, Hive Mind becomes a single point of failure, no offline capability
Copy Hive Mind vectors directly — rejected: different embedding model dimensions, different index structure; re-embedding is cleaner than vector translation

For: S Seth Shoultes A AI Engineer B Blair Williams S Santiago Perez Asis P Product Lead