decision
ADR-0025: Tier-to-model assignment — Haiku for Free, Sonnet for Pro
ADR-0025 (Proposed, 2026-06-01): Tier-to-model assignment — Haiku for Free, Sonnet for Pro.
Status: Proposed
Date: 2026-06-01
Deciders: Seth Shoultes (Lead Architect), Blair Williams (CEO)
Author: Seth Shoultes
Context
ADR-0002 locked the mechanism for model routing: the TIERS dict in src/memberintel/entitlement/tiers.py is the single source of truth for tier-to-model mapping, with CI guard rules enforcing that import anthropic and .model_id reads only happen in src/memberintel/llm/. ADR-0002 did not name which model each tier receives. That assignment has lived in TIERS as an implementation detail without a written decision behind it.
This ADR pins the assignment. It is being written in response to caseproof/project-management#368, which flagged that the Free-tier model choice is undocumented and that a bad Free experience may hurt conversion.
The current implementation is split routing: Free → Anthropic Claude Haiku, Pro → Anthropic Claude Sonnet. Earlier in the V1 cycle the Free-tier model was proposed as a locally-hosted Ollama-class open-source model (referenced in the 2026-05-11 Blair × Seth working session). That was revised 2026-05-20 — the locally-hosted approach was deferred to a future cost-management consideration and the V1 launch choice became Haiku. The revision was applied across the forward-looking specs and the executive summary in memberpress-intel#3; the 2026-05-11 meeting record carries a dated addendum noting the supersession.
The cost projection for the split routing is documented in the V1 Cost Discipline Review. The headline math:
| Component | Cost contribution |
|---|---|
| 5 Haiku chat turns / mo | $0.085 |
| Weekly digest (Haiku) | $0.054 |
| Site analysis (Sonnet, weekly cached) | $0.336 |
| Cross-pollination amortized | $0.100 |
| Sync infra + storage + auth | $0.500 |
| Total Free user / month | ~$1.07 |
Target ceiling: $1.10 / Free user / month. The $1.07 estimate holds only if (a) site analysis stays Sonnet-cached weekly and never runs ad-hoc for a Free user, (b) chat input stays bounded at 12K tokens per turn, and (c) infra holds at $0.50 (flagged optimistic — a realistic Cloud SQL + pgvector + Cloud Run + Secret Manager + audit-log GCP stack is more likely $0.70–$0.90, which would push the total to ~$1.30).
The Free tier is designed to break even at the V1 conversion floor: 50,000 Free users × $1.07 = $53,500 / month burn. 5% Free → Pro conversion × 50,000 × $29 / month × 70% margin ≈ $50,750 / month gross. The margin of safety is built from beating 5% conversion, not from infra savings.
Sonnet-for-all alternative — cost sanity: if Free users were routed to Sonnet instead of Haiku, the chat-turn cost component alone jumps from $0.017/turn to ~$0.34/turn. Five turns/month per Free user puts the monthly chat-turn cost at $1.70 (vs. $0.085 on Haiku), and the total Free-user cost climbs from $1.07 to roughly $2.70/month. At 50,000 Free users that’s $135,000/month — versus the same $50,750/month gross at the floor conversion. Free tier loses ~$85,000/month at the conversion floor. Sonnet-for-all only works if Haiku-Free conversion is so visibly worse than Sonnet-Free conversion that Sonnet-Free more than doubles the conversion rate. That’s a strong empirical claim with no current evidence behind it.
The PM’s concern remains live. Haiku is cheaper than Sonnet and also meaningfully slower-thinking and lower-quality on nuanced advisor work. If Haiku produces answers that Free users perceive as visibly worse, the conversion engine breaks two ways: (1) Free users bounce without converting because the product feels weak, or (2) Pro tier has to consistently outperform a baseline so low that Sonnet stops being the differentiator it could be. Neither is a stable equilibrium. The mitigation isn’t “use Sonnet for everyone”; it’s “instrument Free-tier quality continuously and tighten / loosen the runtime-tunable dials when the data says so.”
Decision
V1 ships with Anthropic Claude Haiku for Free tier and Anthropic Claude Sonnet for Pro tier. The mapping lives in the TIERS dict in src/memberintel/entitlement/tiers.py and is enforced server-side per ADR-0002. No client input can route a Free user to Sonnet. No environment-flag bypass exists. The model strings themselves are opaque ModelHandle values minted by the entitlement service — never plain strings handed to callers.
Beta-period observability is non-negotiable. Three signals are tracked weekly from beta launch through GA:
- Free → Pro conversion rate by cohort (60-day window). Floor: 5%.
- Free-tier eval-suite scores on MP-operator scenarios. The eval suite already covers tier-routing-safety as a release-blocking gate; this adds a Free-tier-quality gate that compares Haiku responses against a Sonnet baseline on a fixed eval set.
- Free-tier thumbs-feedback rate (👍 / 👎 ratio). Reviewed in monthly product reviews.
The escape hatches are runtime-tunable, not code changes. Per ADR-0001, the entitlement service exposes dial values for chat cap, site-analysis cadence, digest model, and per-operation token budgets. If Free-tier quality measurably hurts conversion, the dials tighten or loosen via config change — no deploy, no model swap. The specific levers, in order of escalation:
- Loosen the Free chat cap (give Free users more turns; cost goes up, but if conversion is the bottleneck this is the cheapest test).
- Promote one operation from Haiku to Sonnet for Free users (e.g., the first chat turn of a new conversation, or the final-answer turn in a multi-turn flow). Surgical, instrumented.
- As a last resort, full Sonnet-for-Free for a defined cohort, with cost telemetry and conversion measured against a control cohort. If the lift doesn’t materially exceed cost, the change is reverted.
A reconsideration trigger is written into this ADR. If at 90 days post-GA the Free-tier conversion rate sits below 4% (one full point below floor) AND the eval-quality gap between Haiku and Sonnet on the eval suite is greater than 20 points, this ADR is reopened. Reopening means a written cost-vs-quality analysis goes to Blair with a specific proposal (which lever to pull from the list above, or a model-swap proposal with new cost projection). The reopen is mandatory at those thresholds; it’s not at Seth’s discretion.
Consequences
Positive:
- The unit economics work at the V1 conversion floor. Free tier is profitable at 5% conversion; every point of conversion above floor is upside.
- The cost-discipline invariants from ADR-0002 hold without modification. Free users cannot be silently routed to Sonnet.
- The runtime-tunable dials make Free-tier quality an operational lever, not a code-change lever. Tightening or loosening doesn’t ship through CI.
- The reconsideration trigger gives the PM concern a written escalation path. The decision isn’t a permanent commitment; it’s a 90-day operational hypothesis with measurable revisit conditions.
Negative / costs:
- Haiku is slower-thinking and lower-quality than Sonnet on nuanced advisor work. Free users get a worse advisor than Pro users. The product positioning has to make this honest without making Free feel punitive.
- The cost projection ($1.07 / Free user / month) holds under three preconditions, one of which (infra at $0.50) is flagged optimistic. Real infra is more likely $0.70–$0.90, pushing the total to ~$1.30. The Free-tier break-even shifts: at $1.30 × 50,000 = $65K/mo burn vs. $50K/mo gross at floor conversion → Free loses ~$15K/mo at floor unless conversion clears 6%.
- Differential answer quality between Free and Pro adds an eval-suite axis. The eval suite has to cover both tiers on the same scenarios, with both Haiku-Free and Sonnet-Pro evaluated independently and the gap tracked over time. That’s more eval work.
Mitigations:
- The V1 cost discipline review explicitly flags the infra-cost assumption as optimistic and requires actual cost telemetry once beta is live. If infra runs hot, the conversion floor for break-even has to rise — which is a strategy conversation, not a routing change.
- The reconsideration trigger (4% conversion, 20-point eval gap) names the operational state that forces a revisit. It doesn’t paper over the risk; it commits to acting on it if the risk materializes.
- The dial-tightening playbook (above) gives the Product Lead a way to respond to a real conversion problem without going through engineering or an ADR rewrite.
Alternatives considered
-
Sonnet for all tiers. Rejected on unit economics. At Free-tier scale (50,000 users × $2.70/mo each = $135K/mo) the Free tier loses ~$85K/mo at the V1 conversion floor unless Free → Pro conversion roughly doubles from the projected 5%. There is no empirical evidence that Sonnet-Free would convert better than Haiku-Free, and the cost downside is large enough that “let’s try Sonnet and see” is not an acceptable bet at this stage.
-
Haiku for all tiers. Rejected on product differentiation. The Pro tier’s value proposition is “smarter advisor on your data” — that requires Sonnet’s nuance on Pro chat, advanced reports, and the brain editor. Demoting Pro to Haiku removes the differentiator and likely tanks Pro conversion entirely.
-
Locally-hosted Ollama-class model for Free. Considered at the 2026-05-11 Blair × Seth working session. Deferred 2026-05-20 — the GCP-hosted Anthropic Haiku approach is simpler, faster to ship, and avoids the operational cost of running model inference infrastructure. The Ollama-class path is preserved as a future cost-management consideration if Anthropic API pricing changes materially or the Free-tier user count scales past current modeling assumptions.
-
Multi-provider routing (Vertex AI Gemini as a second provider). Proposed in ADR-0020 (still Proposed at time of writing). Orthogonal to this decision — ADR-0020 governs which providers sit behind the mitigation seam; this ADR governs which Anthropic model each tier receives. Once ADR-0020 is Accepted, a future ADR can introduce per-(tier, operation) provider routing where Vertex makes economic sense (e.g., wrap-up generation, classification workloads). That doesn’t change the Free → Sonnet vs. Haiku decision; it adds a third option for narrow use cases.
-
Surgical promotion: one Haiku operation upgraded to Sonnet for Free users. Considered but not chosen as the V1 default — adds eval surface area without clear evidence of need. Listed as a dial-tightening lever (above) for use during beta if the data calls for it.