M MemberIntel KB
Activity Decisions

decision

ADR-0008: Voyage voyage-3-lite for Embeddings

ADR-0008 (Accepted, 2026-05-12): Voyage voyage-3-lite for Embeddings.

Status: Accepted
Date: 2026-05-12
Deciders: Seth (Lead Architect)

Context

Slice 1 needs an embedding model to convert text chunks into vectors for pgvector similarity search. The choice of embedding model determines:

  1. Dimensionality — affects pgvector index size and search latency
  2. Quality — affects retrieval relevance, which directly impacts chat citation accuracy
  3. Cost — embedding API pricing per token
  4. Latency — embedding API response time affects search and chat endpoint latency
  5. Portability — whether we can swap models later without re-embedding the entire corpus

The V1 spec and design doc specify Voyage voyage-3-lite as the embedding model (512 dimensions). This ADR documents why.

Decision

Voyage voyage-3-lite (512 dimensions) as the Slice 1 embedding model.

Key properties:

  • Dimensions: 512 (vs. 1024 for voyage-3 or 1536 for text-embedding-3-small)
  • Cost: $0.02/1M tokens (cheapest in the Voyage lineup)
  • Quality: Strong retrieval benchmarks, especially for technical documentation (MTEB, BEIR)
  • Latency: ~50-100ms per embedding call for short-to-medium texts
  • API: Straightforward Python SDK (voyageai), simple batch support

The embedding column on brain_entries uses Vector(512) from pgvector. If we switch to a different model later, we re-embed the entire corpus and update the column type.

Consequences

Positive:

  • 512 dimensions is half the size of voyage-3 (1024d) — halved index size, faster cosine similarity queries
  • $0.02/1M tokens makes the initial seed (~45K docs, 70K code chunks) cost under $5
  • Voyage’s API is simple and well-documented; the Python SDK has first-class async support
  • The voyage-3-lite model is purpose-built for retrieval, not general-purpose — better recall for our use case
  • Nightly re-embeds of changed content are cheap (typically <1000 chunks per sync)

Negative / costs:

  • Lower quality than voyage-3 (1024d) — there’s a measurable but small recall gap on MTEB benchmarks
  • Vendor lock-in to Voyage for embedding — switching models requires full re-embedding
  • No on-device/self-hosted option (must call Voyage API)
  • The 512d vectors can’t be compared to vectors from a different model

Mitigations:

  • The embed_texts() function in src/memberintel/api/retrieval/embed.py is the single seam — all embedding goes through one module. Switching models means changing one file and re-running scripts/seed_brain.py.
  • Voyage is the best-in-class for retrieval embeddings (MTEB #1 as of 2026-05). The recall gap vs. voyage-3 is ~2-3 percentage points on BEIR benchmarks, which is acceptable for a chat-with-citations product.
  • If cost becomes a concern, we can cache embeddings in brain_entries.embedding (which we already do) and only call the API for new content during sync.

Alternatives considered

  • OpenAI text-embedding-3-small (1536d) — rejected: higher cost ($0.02/1M tokens but 3x the dimensions = 3x the index size), no significant quality advantage for technical documentation retrieval
  • Voyage voyage-3 (1024d) — rejected: higher cost ($0.06/1M tokens), 2x the index size, marginal quality improvement that doesn’t justify the cost for V1
  • Self-hosted sentence-transformers — rejected: operational overhead (GPU provisioning, model serving), lower quality than Voyage, no managed API
  • pgvector halfvec (float16) — deferred: could halve index size further but adds complexity; revisit if index size becomes a bottleneck on Cloud SQL
For: S Seth Shoultes A AI Engineer B Blair Williams S Santiago Perez Asis P Product Lead