decision
ADR-0008: Voyage voyage-3-lite for Embeddings
ADR-0008 (Accepted, 2026-05-12): Voyage voyage-3-lite for Embeddings.
Status: Accepted
Date: 2026-05-12
Deciders: Seth (Lead Architect)
Context
Slice 1 needs an embedding model to convert text chunks into vectors for pgvector similarity search. The choice of embedding model determines:
- Dimensionality — affects pgvector index size and search latency
- Quality — affects retrieval relevance, which directly impacts chat citation accuracy
- Cost — embedding API pricing per token
- Latency — embedding API response time affects search and chat endpoint latency
- Portability — whether we can swap models later without re-embedding the entire corpus
The V1 spec and design doc specify Voyage voyage-3-lite as the embedding model (512 dimensions). This ADR documents why.
Decision
Voyage voyage-3-lite (512 dimensions) as the Slice 1 embedding model.
Key properties:
- Dimensions: 512 (vs. 1024 for
voyage-3or 1536 fortext-embedding-3-small) - Cost: $0.02/1M tokens (cheapest in the Voyage lineup)
- Quality: Strong retrieval benchmarks, especially for technical documentation (MTEB, BEIR)
- Latency: ~50-100ms per embedding call for short-to-medium texts
- API: Straightforward Python SDK (
voyageai), simple batch support
The embedding column on brain_entries uses Vector(512) from pgvector. If we switch to a different model later, we re-embed the entire corpus and update the column type.
Consequences
Positive:
- 512 dimensions is half the size of
voyage-3(1024d) — halved index size, faster cosine similarity queries - $0.02/1M tokens makes the initial seed (~45K docs, 70K code chunks) cost under $5
- Voyage’s API is simple and well-documented; the Python SDK has first-class async support
- The
voyage-3-litemodel is purpose-built for retrieval, not general-purpose — better recall for our use case - Nightly re-embeds of changed content are cheap (typically <1000 chunks per sync)
Negative / costs:
- Lower quality than
voyage-3(1024d) — there’s a measurable but small recall gap on MTEB benchmarks - Vendor lock-in to Voyage for embedding — switching models requires full re-embedding
- No on-device/self-hosted option (must call Voyage API)
- The 512d vectors can’t be compared to vectors from a different model
Mitigations:
- The
embed_texts()function insrc/memberintel/api/retrieval/embed.pyis the single seam — all embedding goes through one module. Switching models means changing one file and re-runningscripts/seed_brain.py. - Voyage is the best-in-class for retrieval embeddings (MTEB #1 as of 2026-05). The recall gap vs.
voyage-3is ~2-3 percentage points on BEIR benchmarks, which is acceptable for a chat-with-citations product. - If cost becomes a concern, we can cache embeddings in
brain_entries.embedding(which we already do) and only call the API for new content during sync.
Alternatives considered
- OpenAI
text-embedding-3-small(1536d) — rejected: higher cost ($0.02/1M tokens but 3x the dimensions = 3x the index size), no significant quality advantage for technical documentation retrieval - Voyage
voyage-3(1024d) — rejected: higher cost ($0.06/1M tokens), 2x the index size, marginal quality improvement that doesn’t justify the cost for V1 - Self-hosted sentence-transformers — rejected: operational overhead (GPU provisioning, model serving), lower quality than Voyage, no managed API
- pgvector
halfvec(float16) — deferred: could halve index size further but adds complexity; revisit if index size becomes a bottleneck on Cloud SQL