role

Senior AI Engineer JD

Senior AI Engineer role definition: owns the AI substrate end-to-end (inference pipeline, retrieval, prompt versioning, eval suite, cost discipline, feedback loop) under Seth's architectural direction. Reports to Seth Shoultes. Hire pending — target close mid-June 2026.

Role: Senior AI Engineer, MemberIntel
Incumbent: (role unfilled — sourcing in progress; see public job posting)
Reports to: Seth Shoultes (Lead Architect, MemberIntel)
Peers: Ronald Reymundo (WP Dev — interim code-quality enforcer), matrixed engineering support from Mahmoud Saeed (IPJ) and Thomas Levy (IPJ Lead)
Effective: May 2026
References: MemberIntel SPEC v1, Seth — Lead Architect JD, V1 cost discipline review, the five locked ADRs in the V1 product repo at docs/adr/

Mission

Own the AI substrate of MemberIntel end-to-end. Inference pipeline, retrieval, prompt versioning, eval suite, cost discipline, and the model-improvement feedback loop. Build a production system that has to be accurate, cheap to run, and steadily better month over month — for thousands of paying customers, weekly.

This is a shipping role, not a research role. We’re not training models or publishing papers. The product has to work, has to stay within budget, and has to keep getting better against measurable evals. The work that fills your week is engineering, not investigation.

Authority structure

Seth Shoultes (Lead Architect, MemberIntel) holds:

Material architecture choices (model routing strategy, brain isolation, vector store, hosting) — proposes to Blair, Blair approves
Engineering team direction, sprint scope, engineering velocity
Hiring decisions for the engineering team (including this role’s successor hires)
Vendor / tooling decisions for the engineering stack
Final code review of senior engineering work (until you’re ramped, then you own this)

Blair (CEO, MemberIntel) holds:

Product strategy, target customer, pricing
Final approval on PRDs and material architecture commitments
The release-gate threshold for the differentiation eval

Senior AI Engineer (this role) holds:

Inference pipeline implementation and operation
Retrieval implementation (pre-budgeted token counts, citation grounding, brain partitioning enforcement)
Prompt versioning, prompt regression testing, prompt drift detection
The eval suite: design, growth, CI integration, release-blocking gates
Cost telemetry and cost discipline enforcement (per-user caps, model routing enforcement at the wrapper layer, caching, abuse detection)
The model-improvement feedback loop (thumbs feedback ingestion, eval gap surfacing, prompt iteration)
Day-to-day code review for AI/ML-touching changes (once ramped, ~Phase 2)

Product Lead holds:

PRD authoring (you provide feasibility input, especially on cost and eval-coverage implications)
Customer-facing AI behavior decisions (citation styling, error UX, tier-cap messaging)
Quality bar / “are we shipping or not” gate

What you own

1. Inference pipeline (per SPEC §6, §8)

Implement and operate the llm.call(handle, operation, prompt, ...) wrapper as the only seam to the Anthropic SDK. CI guard rules enforce the boundary; you keep them green.
Tier-aware model routing via opaque ModelHandle minted by the entitlement service. Never accept a model string from a caller. Never let a Free request hit Sonnet. (ADR-0001, ADR-0002)
Structured tool-call orchestration for query_customer_metrics, analyze_site, search_global_brain, search_customer_brain, update_customer_brain. No Agent SDK at the runtime layer (per ADR-0005) — you own the dispatch.
Citation-grounded responses: every AI response cites the data it references (per SPEC §8.4). Hallucinations on financial data are trust-killers (Risk #7). The discipline lives in the prompt and the eval suite.

2. Retrieval

Brain retrieval pre-filtered before the LLM — never dump full corpus. Token-budget the retrieval result, don’t post-truncate.
Per-tenant isolation enforced by RLS (per ADR-0003). Every query carries current_tenant_id. You verify the invariant in tests.
pgvector for V1, with the Pinecone migration path preserved as a connection-string swap.

3. Prompt versioning

Every prompt is versioned. Diffs reviewed. Production prompts live in code, not in a wiki.
Prompt regression tests run on every PR. Prompt drift is detectable, not surprising.
A new prompt or retrieval change ships only when the eval suite is green for it (per Critical Norm 4).

4. Eval suite (the differentiation moat per Strategic Risk Landscape Risk #1)

Design real evals with adversarial cases. Vibes-based evals don’t ship.
Tier-routing-safety category: 100%-pass blocking (per arch-evals). Day-one CI gate.
Differentiation eval: MemberIntel scored vs. vanilla Sonnet on MP-operator scenarios. Phase 3 milestone; monthly executive review with Blair starting Phase 4. Methodology is yours; the release-gate pass-bar is Blair’s call (presented by you).
Eval coverage grows with every prompt or retrieval change. Eval debt is engineering debt — you don’t let it accumulate.

5. Cost discipline (per SPEC §5.5 and the V1 cost discipline review)

Per-user caps enforced server-side at the entitlement layer; never client-trustable.
Hard max_tokens caps per operation, applied by the wrapper. Input-token ceilings are non-negotiable.
Aggressive caching (site analysis 7-day, insight cards until refresh, digest pre-computed).
Cost-per-cohort telemetry visible to the Product Lead via the BigQuery sink (per ADR-0004) — surface drift early.
Abuse prevention (rate limits, anomaly flagging).
The Free-tier circuit-breaker dials (chat cap, site-analysis cadence, digest model) are runtime-tunable by the Product Lead. You implement them so dial changes don’t require a deploy.

6. Feedback loop and continuous improvement

Thumbs feedback ingestion → eval-gap surfacing → prompt iteration → eval-suite growth. The cycle is engineering, not analyst work.
No fine-tuning in V1 (per SPEC §3 non-goals and §8.4). All learning happens through brain content + retrieval + prompt iteration.
Cross-pollination job (Phase 2): runs as the migration role only, anonymization step explicit (per cost-review pickup item).

7. Senior engineering review (Phase 2+)

Once ramped (target: ~end of August 2026), you take on senior code review for AI/ML-touching changes. Until then, Ronald enforces baseline quality and Seth backstops senior review.
You pair with the rest of engineering on the AI surfaces. Mahmoud is available ad hoc through Thomas for data-layer consultation if needed.

8. ADRs and material-choice surfaces

Material choices you propose (e.g., LLM tracing tool — current pending ADR-0006; future provider-portability evolution; eval framework choice) get written down as ADRs and routed through Seth, who routes through Blair if it crosses the material-choice threshold.

What you do NOT own

Product strategy, target customer, 18-month roadmap — Blair
PRD final approval — Blair (Product Lead drafts; you provide feasibility input)
Pricing strategy — Blair (you implement)
Sprint scope and engineering velocity decisions — Seth
Engineering hiring decisions — Seth (you participate in interview loops)
Customer onboarding flow, marketing site, beta program — Product Lead
Privacy / compliance program — Product Lead with outside counsel (you provide technical input)
Quality bar / ship gate — Product Lead holds the gate; you provide technical readiness
Cross-functional coordination outside engineering — Product Lead
Project tracking infrastructure — Santiago

If asked to weigh in on any of these, route back to the right owner. Stay in your lane.

Critical role norms

Cost discipline first, scale second. Per SPEC §5.5 — hard token caps, aggressive caching, weekly cost-per-cohort review during early launch. Don’t optimize for performance at the expense of unit economics. The Free-tier model breaks even AT the floor conversion target — your discipline determines whether we’re above or below the line.
Citations are non-negotiable. Every AI response cites the data it references. Hallucinations on financial data are trust-killers. The discipline lives in the prompt AND the eval suite — never just one.
Free-tier server-side enforcement of model routing. Never accidentally route a Free user to Sonnet. The entitlement service is the single source of truth. CI guard rules enforce it. You keep them green.
Eval gate before any prompt or retrieval change ships. No exceptions. If eval coverage doesn’t exist for a change, you build it before you ship the change.
No fine-tuning in V1. All learning happens through brain content + retrieval + prompt iteration.
No LangChain. No Agent SDK at the runtime layer. Per SPEC §8.4 and ADR-0005. You own the orchestration. Internal dev tooling under tools/ may use the Agent SDK; runtime under src/ may not.
ADRs for material choices. When you propose a vendor/tooling/architecture change with non-trivial blast radius, write the ADR. Future-you and the team need them.
Disagreements with Seth go to Blair within 48 hours. No silent escalation drift. If something not in the decision-rights matrix comes up, surface it fast.

Success measures (12-month)

Measure	Target
Hired and ramped	Mid-June close, productive by mid-August
Eval suite operational with release-blocking categories	Before Phase 1 ends (per SPEC §13)
Differentiation eval delivers a measurable advantage over vanilla Sonnet on MP-operator scenarios	At GA (per SPEC Risk #1 mitigation)
Cost-per-Free-user steady state	At or below $1.10/mo (per SPEC §5.4)
Cost-per-Pro-user steady state	At or below $12/mo (per SPEC §5.4)
Hallucination rate on financial-data answers	< 1% on the eval suite
Tier-routing-safety eval	100% pass — every CI run, every release
Prompt-drift incidents that reached production	0 (detection should be in CI, not in user reports)
ADRs for material AI/ML choices	One per material change, all in the V1 repo

Reporting cadence

Cadence	Audience	Format
Daily	Seth	Async — eval-suite status, anything red in cost telemetry
Weekly	Seth	30-min sync — sprint status, blockers, upcoming material choices
Monthly (from Phase 4)	Blair	Differentiation eval review (30 min) — Seth attends
Quarterly	Blair + Seth	Architectural review — what’s drifted, what’s improving

What “good” looks like in this role

The inference pipeline is boring. Citations are correct. Cost telemetry is green. Eval gate is doing its job.
A new prompt change ships in a day, not a week, because the eval suite tells you whether it’s safe before you have to ask anyone.
Free-tier cost stays at $1.07 ± $0.05 across cohorts. Pro-tier stays at $10.75 ± $0.50.
Prompt drift is caught in CI, not in a customer report.
The Product Lead trusts the AI behavior enough to instrument new conversion surfaces against it without asking.
Seth is freed up for architecture, not pulled into prompt debugging.
When Blair asks “is the brain working?” the answer is a number from the differentiation eval, not a vibes-based read.
The Free-tier circuit-breaker dials work as designed — Product Lead can tune them without paging engineering.
You’ve left no ADR debt and no eval debt. New work doesn’t pay interest on old shortcuts.

Document version: Draft v1 — to be reviewed with the incoming Senior AI Engineer before finalization. Decision-rights table requires sign-off from Blair, Seth, and the new hire before this JD goes operational.

For: A AI Engineer S Seth Shoultes B Blair Williams