decision

ADR-0008: Voyage voyage-3-lite for Embeddings

ADR-0008 (Accepted, 2026-05-12): Voyage voyage-3-lite for Embeddings.

▶ Watch the 1:03 summary — ADR-0008 — Voyage voyage-3-lite for Embeddings, explained

Status: Accepted
Date: 2026-05-12
Deciders: Seth (Lead Architect)

Context

Slice 1 needs an embedding model to convert text chunks into vectors for pgvector similarity search. The choice of embedding model determines:

Dimensionality — affects pgvector index size and search latency
Quality — affects retrieval relevance, which directly impacts chat citation accuracy
Cost — embedding API pricing per token
Latency — embedding API response time affects search and chat endpoint latency
Portability — whether we can swap models later without re-embedding the entire corpus

The V1 spec and design doc specify Voyage voyage-3-lite as the embedding model (512 dimensions). This ADR documents why.

Decision

Voyage voyage-3-lite (512 dimensions) as the Slice 1 embedding model.

Key properties:

Dimensions: 512 (vs. 1024 for voyage-3 or 1536 for text-embedding-3-small)
Cost: $0.02/1M tokens (cheapest in the Voyage lineup)
Quality: Strong retrieval benchmarks, especially for technical documentation (MTEB, BEIR)
Latency: ~50-100ms per embedding call for short-to-medium texts
API: Straightforward Python SDK (voyageai), simple batch support

The embedding column on brain_entries uses Vector(512) from pgvector. If we switch to a different model later, we re-embed the entire corpus and update the column type.

Consequences

Positive:

512 dimensions is half the size of voyage-3 (1024d) — halved index size, faster cosine similarity queries
$0.02/1M tokens makes the initial seed (~45K docs, 70K code chunks) cost under $5
Voyage’s API is simple and well-documented; the Python SDK has first-class async support
The voyage-3-lite model is purpose-built for retrieval, not general-purpose — better recall for our use case
Nightly re-embeds of changed content are cheap (typically <1000 chunks per sync)

Negative / costs:

Lower quality than voyage-3 (1024d) — there’s a measurable but small recall gap on MTEB benchmarks
Vendor lock-in to Voyage for embedding — switching models requires full re-embedding
No on-device/self-hosted option (must call Voyage API)
The 512d vectors can’t be compared to vectors from a different model

Mitigations:

The embed_texts() function in src/memberintel/api/retrieval/embed.py is the single seam — all embedding goes through one module. Switching models means changing one file and re-running scripts/seed_brain.py.
Voyage is the best-in-class for retrieval embeddings (MTEB #1 as of 2026-05). The recall gap vs. voyage-3 is ~2-3 percentage points on BEIR benchmarks, which is acceptable for a chat-with-citations product.
If cost becomes a concern, we can cache embeddings in brain_entries.embedding (which we already do) and only call the API for new content during sync.

Alternatives considered

OpenAI text-embedding-3-small (1536d) — rejected: higher cost ($0.02/1M tokens but 3x the dimensions = 3x the index size), no significant quality advantage for technical documentation retrieval
Voyage voyage-3 (1024d) — rejected: higher cost ($0.06/1M tokens), 2x the index size, marginal quality improvement that doesn’t justify the cost for V1
Self-hosted sentence-transformers — rejected: operational overhead (GPU provisioning, model serving), lower quality than Voyage, no managed API
pgvector halfvec (float16) — deferred: could halve index size further but adds complexity; revisit if index size becomes a bottleneck on Cloud SQL

For: S Seth Shoultes A AI Engineer B Blair Williams S Santiago Perez Asis P Product Lead