decision
ADR-0006: Hive Mind as Brain Seed Source
ADR-0006 (Accepted, 2026-05-12): Hive Mind as Brain Seed Source.
Status: Accepted
Date: 2026-05-12
Deciders: Seth (Lead Architect)
Context
Slice 1 needs a global brain — a vector store of MemberPress product knowledge that the chat endpoint can cite. Building this from scratch would require curating, scraping, and embedding 45K+ docs, 70K+ code chunks, and 13K+ graph entities across 11 Caseproof products.
The Hive Mind (~/Local Sites/caseproof-agent/deploy/hive-mind-mcp/) already has all of this content indexed and accessible via an MCP endpoint (hive-mind.caseproofagent.com/mcp). It serves 48 MCP tools for querying Caseproof product knowledge.
The decision is whether to:
- Build MemberIntel’s brain from scratch (curate + scrape + embed)
- Seed from Hive Mind’s existing content (copy + re-embed with our own model)
- Query Hive Mind at chat-time (no local brain, real-time API calls)
Decision
Seed from Hive Mind with re-embedding. MemberIntel will:
- Call Hive Mind’s
hive_format_contextMCP tool to extract content - Re-embed all extracted content with Voyage
voyage-3-lite(512d) into MemberIntel’s own pgvectorbrain_entriestable - Run a nightly sync job to catch Hive Mind updates
This gives us:
- Immediate content coverage (45K+ docs, 70K+ code chunks, 13K+ graph entities)
- Ownership of our own embedding model (not locked to Hive Mind’s choice)
- Low-latency search (pgvector cosine similarity, no network hop at query time)
- Independence from Hive Mind’s uptime
Consequences
Positive:
- Zero cold-start — the brain is populated on day one with production-quality content
- MemberIntel controls its own embedding model; can swap Voyage for a better model later
- Search latency is database-level (pgvector cosine similarity), not API-level
- Nightly sync catches new Hive Mind content
Negative / costs:
- Initial seed is a batch job that takes minutes to hours depending on content volume
- Nightly sync adds operational complexity (needs a cron/scheduler)
- Embedding API costs (Voyage) for the initial seed and nightly re-embeds
- Content freshness is at most 24 hours behind Hive Mind
Mitigations:
- The seed script (
scripts/seed_brain.py) is idempotent — safe to re-run - Nightly sync uses incremental diffing (only embed new/changed content)
- Voyage
voyage-3-liteis cheap ($0.02/1M tokens); full re-embed is under $5 - The
tenant_idfield onbrain_entrieskeeps per-customer isolation for V2
Alternatives considered
- Build from scratch — rejected: months of curation, no day-one content, duplicates Hive Mind’s work
- Query Hive Mind at chat-time — rejected: adds 200-500ms latency per query, Hive Mind becomes a single point of failure, no offline capability
- Copy Hive Mind vectors directly — rejected: different embedding model dimensions, different index structure; re-embedding is cleaner than vector translation