decision

ADR-0016: Brain Mutation via Anthropic Tool Calls (`update_customer_brain`)

ADR-0016 (Accepted (deployed 2026-05-15), 2026-05-15): Brain Mutation via Anthropic Tool Calls (`update_customer_brain`).

▶ Watch the 1:24 summary — ADR-0016 — Brain Mutation via Anthropic Tool Calls (`update_customer_brain`), explained

Status: Accepted (deployed 2026-05-15)
Date: 2026-05-15
Deciders: Seth (Lead Architect), Blair (CEO)

Context

ADR-0015 establishes the four-document brain (SOUL, BIBLE, HEARTBEAT, MEMORY). HEARTBEAT is regenerated from sync data, but the other three are LLM-authored: when a conversation reveals a stable preference, a foundational fact about the site, or an insight worth remembering, something has to persist it.

The “something” is the load-bearing decision. Three options were on the table, each with very different operational and product consequences:

Tool calls. Define an update_customer_brain tool, let the LLM emit a tool_use event when it decides to persist; intercept the event server-side, write to the DB, return a tool_result, continue the response.
Post-hoc JSON parsing. Instruct the LLM via system prompt to emit a structured JSON block at the end of a response; parse it server-side after streaming completes.
Separate “summarize” call. After every conversation (or every N turns), make a follow-up call to a cheap model asking “what should we remember from this conversation?” and persist whatever it returns.

The Anthropic SDK already supported tool use in call_stream() (V1’s primary chat path) and call(). The pipeline already has structured event handling for streaming. The brain documents (SOUL/BIBLE) are read-modify-write — the writer must see the current document and produce a complete replacement — so the writer needs context, not just an emit instruction.

Decision

The LLM mutates brain documents by calling the update_customer_brain Anthropic tool. The server intercepts the tool_use event and executes the write. Streaming and non-streaming paths handle the continuation differently — see Plumbing below.

Tool definition (BRAIN_TOOL_DEFINITION in src/memberintel/api/brain/tool_handler.py):

{
  "name": "update_customer_brain",
  "description": "Save or update persistent knowledge about the user or their site...",
  "input_schema": {
    "type": "object",
    "properties": {
      "target": { "type": "string", "enum": ["soul", "bible", "memory"] },
      "content": { "type": "string" }
    },
    "required": ["target", "content"]
  }
}

Behavior by target:

soul — replace the entire SOUL document (read-modify-write; the LLM has read the current SOUL via the system prompt and emits the complete replacement).
bible — replace the entire BIBLE document for the active site (same pattern).
memory — append a new brain_entries row with collection='memory' and tenant_id=<user_id>.

Plumbing. call() and call_stream() gained an optional tools parameter that passes through to the Anthropic SDK. There are two paths:

Non-streaming chat(). Implements the full Anthropic tool-use loop: receive tool_use, dispatch to handle_brain_tool_call() (src/memberintel/api/brain/tool_handler.py), execute the write, send the tool_result back to the model in a follow-up call so the model can produce the post-write response. This is the protocol-correct path; the model “sees” the result of its save.
Streaming stream_chat_response() (src/memberintel/api/chat/sse.py). The SSE adapter accumulates tool_use events through the stream, then on the end event invokes the on_tool_call callback (wired in src/memberintel/api/chat/router.py). The callback calls handle_brain_tool_call() to persist the write. The stream does not feed a tool_result back to the model in the same turn — the user-facing response is whatever text the model emitted alongside the tool_use event. The save is a fire-and-forget side effect; the model’s next-turn context naturally reflects the new state on the following message. This is a deliberate simplification (streaming a continuation after a tool_result complicates the SSE protocol); the consequence is that the streamed answer can’t reference the just-completed save in the same response.

Commits: e47e972, ee27330, 70e0c6e.

Sanitization on the write path. Every update_customer_brain invocation passes through sanitize_brain_content() before any DB write — same path as the user-facing PUT endpoints in the brain-management UI (ADR-0018). The LLM is treated as untrusted input by the persistence layer.

Discretion is in the system prompt, not in code. The LLM is instructed to call the tool only when it genuinely learns something — not every turn. The server does not enforce a per-turn rate limit on tool calls; if the LLM is over-eager, that’s a prompt-tuning fix.

Consequences

Positive:

The persistence trigger is the model’s intent to remember, not a post-hoc heuristic. The model knows when it has learned something durable; the server doesn’t have to guess.
Read-modify-write works correctly. The LLM sees the current SOUL/BIBLE in the system prompt and produces a complete replacement — no merge conflicts, no field-level patching, no diff logic to maintain.
The same path is used by the brain-management UI (PUT endpoints) and by the LLM (tool calls). Sanitization, limits, and audit are wired in one place — handle_brain_tool_call() and the PUT handlers both call sanitize_brain_content().
The tool definition is a contract the LLM can see. If the schema changes, the LLM gets the new schema in the next call — no code-side translation layer.
Anthropic native: no JSON-extraction parser, no fragile end-of-response convention, no “what if the model didn’t close the JSON block.”

Negative / costs:

We are bound to Anthropic’s tool-use semantics. A model swap (per ADR-0005) to a provider with a different tool-use shape requires an adapter layer.
The LLM controls the write decision, including frequency. An overeager prompt or a hallucinating model could trigger many writes per conversation. Mitigated by the system-prompt instruction (“Do NOT call it on every message”) but not enforced server-side.
BIBLE/SOUL replacement is whole-document. If the LLM truncates content while “updating,” the prior content is gone. The brain-management UI shows the current document so a user can spot and fix this; there is no automatic version-rollback.
Tool calls increase the round-trip count of a single conversation turn (one LLM call → tool_use → server write → tool_result → LLM continuation). Latency budget for that turn rises proportionally.

Mitigations:

The Anthropic dependency is already accepted (ADR-0005). The tool-use binding is in the chat layer; a future adapter to swap providers (e.g., Bedrock’s tool-use API) is a known and bounded surface.
Sanitization clamps content length and strips control characters; the write transaction is bounded by the tier-aware memory limits (ADR-0018). A runaway model can’t write 10,000 memory entries.
Document replacement is acceptable because BIBLE/SOUL are short narrative documents (hundreds of chars, not MBs). The version column on user_souls and site_contexts increments on each write, so an audit trail exists even though full version history isn’t stored.
Latency: the tool call is asynchronous to the user-visible response stream; the user sees the model’s content stream continuously, with the tool call happening in the same turn but invisible.

Alternatives considered

Post-hoc JSON parsing. Rejected. The LLM has to remember to emit the block at the end of every response that warrants persistence, and keep the JSON well-formed in the presence of streaming and partial token cutoffs. Parser brittleness was the dominant risk. Also: post-hoc parsing means the model can’t see a tool_result and acknowledge the write to the user — the persistence is invisible.
Separate “summarize this conversation” follow-up call. Rejected. Doubles the LLM cost for every conversation regardless of whether anything was learned. The summarizer also has less context than the in-conversation model (no system prompt, no in-flight insight), so the quality bar drops. Tool calls let the same model that learned something be the one that persists it.
User-confirmed writes via UI. Considered for BIBLE only; rejected for MVP. The friction of “Save this insight? [Yes] [No]” on every conversational moment of learning would kill the self-improvement loop. The brain-management UI is the correction path, not the consent gate. Users can review and edit anything the LLM wrote.
Two tools — one for SOUL/BIBLE replace, one for MEMORY append. Considered. Rejected for surface-area economy: a single tool with a target enum keeps the prompt instruction shorter and the LLM’s decision simpler (“call the brain tool when X”), while the server-side dispatch is trivial.
Defer the tool-use path; use the brain-management PUT endpoints only. Rejected. Without LLM-driven mutation, “the AI remembers” is a manual user task — antithetical to the product bet. The user can’t be expected to write their own SOUL.

Amendment (2026-07-09, innovations#228): two purpose-built tools; site knowledge no longer chat-editable

As of 2026-07-09 (innovations#228), the chat brain-mutation surface is reshaped.

The generic update_customer_brain tool is retired and replaced by two purpose-built tools:

update_preferences (UPDATE_PREFERENCES_TOOL_DEFINITION in src/memberintel/api/brain/preferences_tool.py) — edits the user’s soul / communication preferences (tone, format, detail level). Whole-document replace, same read-modify-write pattern the original soul target used.
manage_memory (MANAGE_MEMORY_TOOL_DEFINITION in src/memberintel/api/brain/memory_tool.py) — add / list / forget the user’s memories, via an action enum.

Direct bible (site-knowledge) writes from chat are REMOVED. Site knowledge is system-derived (sync / onboarding) and is not chat-editable. The chat tool surface no longer exposes any path to hand-edit what the AI knows about a member’s site; such requests are answered in text explaining the data comes from synced sources. The member-facing REST write (PUT /api/v1/brain/bible) was also removed with #228 — BIBLE writes remain via sync/onboarding (service.update_bible) and the CF-Access-gated admin surface, completing the boundary.

Memory remains conversational-only — there is no Settings UI for it in V1.

This reverses the original “one tool with a target enum” surface-economy decision (see Alternatives → “Two tools”) in favor of two clearer tools, because the two intents now have materially different semantics (whole-document replace vs. add/list/forget) and different user-facing framing. It also aligns the chat tool surface with the ADR-0033 / #231 invisible-brain V1 boundary: a member may shape how the AI communicates (preferences) and manage what it remembers (memory), but may not hand-edit what it knows about their site.

Tool-selection behavior is pinned by tests/evals/test_preferences_memory_tool_selection.py.

For: S Seth Shoultes A AI Engineer B Blair Williams S Santiago Perez Asis P Product Lead