spec

Architecture Overview & Tech Choices

Covers GCP vs Heroku vs DigitalOcean for hosting, then deep-dives per-tenant isolation strategy using shared-schema RLS as the V1 foundation.

This is MemberIntel — an AI-powered SaaS advisor product Blair Williams is building at Caseproof, layered on top of MemberPress. The documents are a fairly mature planning packet: three versions of the spec (V1, V1.5, V2), JDs for the two key leads, a phased team ramp plan, and a decision-rights matrix governing how the two leads collaborate.

The product, in one paragraph. MemberIntel connects to a MemberPress customer’s site and Stripe, gives them a dashboard of membership metrics, and runs an AI chat that gives MP-specific business and technical advice grounded in their actual data. The architectural centerpiece is a two-tier “brain” — a curated global knowledge corpus (seeded MP docs + hand-written playbooks + cross-pollinated anonymized learnings) plus a per-customer brain that accumulates context, goals, actions, and feedback over time. The strategic bet is that this compounds into a moat: more customers feed the flywheel, longer use deepens personalization, and unlike pre-existing proprietary data it doesn’t need legal Phase 0 cover.

Business model. Connected Free tier (capped at 5 Haiku chats/month, monthly snapshot dashboard, weekly digest) plus Pro at $29/mo (unlimited Sonnet, live data, advanced reports, brain editing, eventually agent actions). The Free tier exists to fill the data flywheel cheaply — modeled at ~$1.10/mo per free user, ~$6–12/mo per Pro. Targets are 10% Free signup of MP customer base in 6 months and 5–10% Free→Pro conversion within 60 days.

Roadmap shape. V1 (mid-Oct 2026 GA) is advisor-only — no write actions. V1.5 adds the AI agent (executes membership level CRUD, content protection rules, email/dunning config via the forthcoming MemberPress MCP) plus a “Set up FOR ME” greenfield wizard with a 14-day Pro trial aligned to MP’s money-back guarantee window. V2 expands to BuddyBoss customers (effectively free since BB Memberships runs on MP underneath — sister-company partnership integration, not real platform abstraction). V2.1 (Paid Memberships Pro) and V2.2 (Kajabi) are the genuine cross-platform work. V3+ is Pro+ tier, agency pricing, SOC 2.

Team and structure. Blair is CEO. Cindy Thoennessen is Product Lead (execution, PRDs, marketing, compliance, GTM). Seth Shoultes is Lead Architect (the brain, AI/ML, engineering team). Santiago Perez Asis is the cross-Caseproof PM. The decision-rights matrix is unusually explicit because Cindy and Seth are peers — Blair has final say on strategy, PRDs, design, and material architecture; Seth owns engineering and technical decisions; Cindy owns everything else. Norms include “disagreements go to Blair within 48 hours” and “Cindy doesn’t manage Seth’s people.”

The phased ramp is the most interesting operational piece. It deliberately starts tiny — just Cindy and Seth in May for scoping — then grows by phase: engineering core in June (Ronald + Senior AI Engineer hire), design + content in July (Meo from Russ’s team, Sarah from Katelyn’s), web build in August (Kalpesh), full team plus PR firm Sept through mid-October launch, then taper in November. The load-bearing assumptions: Senior AI Engineer closes mid-June, privacy counsel engaged by June 1 (or GA slips), and Blair holds the line when home managers want their matrixed people back early.

What jumps out as load-bearing or risky. The brain quality at launch (~50 playbooks) is the differentiation moat against generic ChatGPT — if it’s shallow, the product gets compared unfavorably. Free-tier cost spiral is a real concern at 50K users. The MP MCP needs to ship before V1.5 build kicks off. And the content lead is already a bottleneck by V2 — the spec calls out hiring a second one before BuddyBoss launch.

Want me to dig into any particular thread — for example, pressure-test the phased plan, look at gaps in the decision-rights matrix, or work through one of the open questions?

Yeah. I want you to help me brainstorm the technological aspect such as servers, hosting, things like that. Security is a big one. What type of hosts? We go with Google. Cloud services. Should we run this on digital or Heroku?

Technological aspect

The choice of host shapes security, cost ceilings, and how fast the team moves — the spec leaves it as Open Q8 deliberately.

Start with the constraints, because they narrow the field fast.

You’re running a multi-tenant SaaS with per-tenant data isolation as a hard requirement, GDPR + CCPA from launch, Postgres + pgvector (or Pinecone), heavy LLM API egress, scheduled jobs (cross-pollination, weekly digests, site analysis), and webhooks from MemberPress sites + Stripe. You’re targeting tens of thousands of free users plus paying Pro. And you have a small team — Seth plus Ronald plus a Senior AI Engineer hire. That last constraint matters as much as any technical one.

The three real options.

Heroku is the fastest path to “shipped.” Postgres is managed and good. Deploys are git push. Background workers and scheduled jobs are first-class. The team can stay focused on the brain and the product instead of infrastructure. The cost is real though: you’ll outgrow it. Heroku Postgres gets expensive past mid-size, the dyno model burns money at scale, networking primitives are limited (no real VPC peering on standard tiers, restricted egress controls), and per-tenant isolation is something you build entirely in your application layer because Heroku gives you no infrastructure-level partitioning tools. For privacy counsel sign-off, you’ll be doing more talking and less pointing-at-controls.

GCP is the opposite trade. More upfront work — IAM, VPCs, Cloud SQL, Cloud Run or GKE, Secret Manager, Cloud Scheduler, Cloud Tasks, observability stack — but you get serious infrastructure-level controls. CMEK encryption, VPC Service Controls, fine-grained IAM, audit logging built in, regional data residency for GDPR, and clean paths to per-tenant isolation through schema-per-tenant or even database-per-tenant if Pro grows. Cost at scale is meaningfully better than Heroku. Cloud Run is genuinely good for the API layer — scales to zero, no server management, but more flexible than Heroku dynos.

DigitalOcean (you mentioned “digital”) sits in the middle but I’d argue it’s the worst of the three for this specific product. It’s cheaper than GCP for raw compute and simpler than GCP, but their managed Postgres is less mature than either GCP’s Cloud SQL or Heroku Postgres, their compliance posture is thinner (HIPAA only on specific products, no equivalent to VPC Service Controls), and you don’t get the “we ship fast” benefit you get from Heroku. DO is great for indie projects and certain workload shapes. It’s not the right call here.

My honest read.

The spec says “GCP or Heroku” and lets the team pick based on operational expertise. That phrasing is doing a lot of work — it’s really asking who’s going to run this thing. Heroku is the right answer if Seth and the team don’t have hands-on cloud infra experience and you want them spending zero time on Terraform and IAM. GCP is the right answer if you have or are hiring someone who’s run production cloud infrastructure before, because by year two of the product the cost and security advantages compound.

Given the SPEC’s emphasis on per-tenant isolation, GDPR/CCPA from day one, audit logging, and outside privacy counsel sign-off — plus a path to BuddyBoss and beyond at V2 — GCP is the better long-term answer, but only if you have the operational muscle. If the Senior AI Engineer hire has cloud infra chops, GCP. If not, Heroku for V1 with a planned migration trigger (e.g., “when we hit 10K users or the first compliance ask we can’t satisfy, we move”).

On security, the host is only one layer.

The bigger security questions for this product, regardless of host:

Per-tenant isolation strategy. Schema-per-tenant in Postgres, row-level security with tenant_id, or database-per-tenant for Pro? Each has tradeoffs. Schema-per-tenant is the usual sweet spot — strong isolation, manageable ops, works at the scale you’re targeting.

Secrets management. Customer Stripe tokens, MP license keys, OAuth refresh tokens — these need a real secret store (GCP Secret Manager, AWS Secrets Manager, or HashiCorp Vault), not env vars. This is non-negotiable for privacy counsel.

The vector store decision compounds with the host. pgvector keeps everything in one Postgres, which simplifies isolation (you partition with the same tenant_id strategy as everything else) and ops. Pinecone is faster at scale but adds a separate data plane that needs its own isolation story and its own DPA. Start with pgvector.

LLM egress. Every Sonnet and Haiku call leaves your cloud and goes to Anthropic. That’s fine, and Anthropic has a DPA, but it needs to be in the data flow diagram for counsel. No customer data should leave your VPC except through the LLM call path, which means egress controls matter.

Audit logging. Per the SPEC, every tier change, brain update, cross-pollination promotion, agent action (V1.5+), and data sync needs a log. This is much easier on GCP (Cloud Audit Logs are free and structured) than building it on Heroku.

Want to go deeper on any of these? I’d suggest the next thread is either (a) per-tenant isolation strategy, which drives a lot of the schema and host decisions, or (b) the actual GCP-vs-Heroku decision framed as “what would Seth need to be true to pick each one.” Either is a good 30-minute brainstorm.

I think that we will go with GCP. I am somewhat proficient with GCP when using Claude Code for help. so, lets brainstorm per-tenant isolation strategy

GCP it is. The “proficient with GCP when using Claude Code for help” framing is honest — Seth can run real infrastructure but will lean on AI to scaffold the harder pieces. That means picking patterns that are well-documented and that Claude Code can reason about confidently, rather than exotic setups.

Let’s get into per-tenant isolation.

The three real models, what they actually mean.

Shared schema with tenant_id columns and Row-Level Security (RLS). Every table that holds customer data has a tenant_id column, and Postgres RLS policies enforce that queries can only see rows for the active tenant. The application sets a session variable like app.current_tenant_id and Postgres does the rest. One database, one schema, all customers share tables.

Schema-per-tenant. One database, but each customer gets their own Postgres schema (tenant_abc123.members, tenant_xyz789.members). Queries are scoped by setting the search path. Tables are duplicated per tenant.

Database-per-tenant. Each customer gets their own Postgres database. Strongest isolation, highest ops cost.

There’s also a hybrid: shared-schema-with-RLS for free users, dedicated-database for Pro or enterprise. We’ll come back to this because it matters for your Free/Pro split.

How they actually trade off for MemberIntel specifically.

	Shared + RLS	Schema-per-tenant	DB-per-tenant
Isolation strength	Logical (enforced by Postgres)	Stronger logical (separate namespaces)	Physical
Scale to 50K free users	Excellent	Painful (50K schemas is a Postgres anti-pattern)	Impossible economically
Migration complexity	One migration, runs once	One migration × N tenants	One migration × N tenants
Backup / restore one tenant	Hard (filtered dump)	Easier (`pg_dump --schema=tenant_x`)	Trivial
Customer data export (GDPR)	Filtered query	Schema dump	Database dump
Customer data deletion (GDPR)	DELETE WHERE tenant_id	DROP SCHEMA	DROP DATABASE
Privacy counsel comfort	Good if RLS is rigorous	Better	Best
Per-tenant performance isolation	Weak (noisy neighbor)	Weak	Strong
Ops complexity	Lowest	Medium	High
Pinecone/pgvector partition story	Clean (tenant_id filter)	Clean (separate vector tables per schema)	Clean (separate database)

My read for your situation.

Schema-per-tenant breaks at your free-tier scale. 50,000 schemas in one Postgres is not where you want to be. You’ll hit catalog bloat, migrations become a multi-hour job, and pg_dump of the whole database becomes painful. Cross this option off.

Database-per-tenant is a non-starter for free users at scale economically — even with Cloud SQL’s lighter instances, you can’t run 50K databases. It’s also overkill for your threat model. The data is sensitive but it’s not HIPAA, it’s not financial primary data, it’s not credentials. It’s MP membership data and Stripe analytics already aggregated.

That leaves shared-schema with RLS as the right answer for V1. With one important wrinkle.

The wrinkle: hybrid for Pro down the road.

Shared-schema with RLS is right for V1 and through 50K free users plus thousands of Pro. But by the time you’re at significant scale or you’re talking to enterprise-y customers post-V2, you’ll get asked “do we share a database with other customers?” The honest answer with RLS is “logically isolated, physically shared,” and most buyers accept that. Some won’t.

So the architectural move is: build V1 on shared-schema-with-RLS, but design tenant_id as the universal partitioning key from day one — meaning every table has it, every query goes through it, and your application has zero queries that don’t filter by tenant. That way, if you later need to split a customer (or a class of customers) into a dedicated database, the application code doesn’t change. You’re just routing their connection string differently.

What “rigorous RLS” actually looks like.

This is where most teams get RLS wrong, so it’s worth being concrete.

The core pattern is that every customer-data table has tenant_id UUID NOT NULL REFERENCES tenants(id), and an RLS policy like:

ALTER TABLE members ENABLE ROW LEVEL SECURITY;
ALTER TABLE members FORCE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON members
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

The FORCE ROW LEVEL SECURITY part is critical. Without it, the table owner (your app’s connection user, if it owns the table) bypasses RLS and you have a silent isolation hole. You want a setup where the application connects as a non-owner role that cannot bypass policies.

Then on every request, before any query touches customer data, the application sets SET LOCAL app.current_tenant_id = '...' from the authenticated session. SET LOCAL scopes it to the transaction, which is what you want.

The rules that keep this airtight.

Two database roles. A migration role (owner, can bypass RLS, used only by migration tooling) and an app role (cannot bypass RLS, used by all running application code). The app role’s permissions are gated by RLS on every table.
No table escapes. Every table that holds customer-derived data — members, transactions, chat messages, brain entries, audit logs, embeddings, sync state — has tenant_id and RLS. Every one. This is a code review checklist item and ideally a CI check that fails if any new table is missing the column or the policy.
Set the tenant context as middleware, not as caller responsibility. Wrap every request in middleware that opens a transaction, sets app.current_tenant_id, runs the handler, commits. Application code should never directly issue SET app.current_tenant_id — it should be impossible to forget.
Background jobs explicitly carry tenant_id. Cross-pollination, weekly digest generation, site analysis — every job has the tenant in its payload and sets the context the same way. No “loop over tenants and process them” without re-setting context per iteration.
The cross-pollination job is the one place where tenant context changes. This is the highest-risk job in your system because it intentionally reads from many per-customer brains and writes to a global brain. The privacy bug Seth’s JD warns about — “anonymization slip” — almost always happens here. The mitigation: cross-pollination runs as the migration role (no RLS), but it has a dedicated module that’s the only code that ever touches multi-tenant data, it’s heavily audited, and every output passes through an explicit anonymization step before going into the global brain.
The global brain is a separate data domain. Global brain entries do NOT have tenant_id and live in tables RLS doesn’t apply to. Make this distinction explicit in schema naming — customer_brain_entries (RLS, tenant-scoped) vs global_brain_entries (no RLS, public). Don’t mix.
Test isolation, don’t trust it. Write integration tests where you authenticate as tenant A and confirm you cannot read, update, or delete any data belonging to tenant B. Run these tests on every PR. This is the single best thing you can do for privacy counsel comfort.

On vector storage specifically.

If you go with pgvector (which I’d recommend for V1, per your spec), the embeddings table is just another table with tenant_id and RLS. Same pattern. Per-customer brain embeddings are tenant-scoped; global brain embeddings are in a separate table with no RLS. Clean.

If you ever move to Pinecone, their isolation model is namespaces — you’d use tenant_id as the namespace name. Same conceptual partition key, different mechanism. The application abstraction stays the same.

On Cloud SQL specifically.

Use Cloud SQL for Postgres 16+, with these settings worth flagging:

Private IP only — no public IP. Connect via VPC peering or Cloud SQL Auth Proxy from Cloud Run.
CMEK (customer-managed encryption keys) — you control the KMS key. Privacy counsel will ask.
Point-in-time recovery on, retention 7+ days.
Audit logging via pgaudit extension to Cloud Logging.
Backups going to a separate GCS bucket with its own retention policy.
Read replica for analytics queries (the dashboard) so the primary isn’t fighting both transactional load and aggregations.

The decisions you actually need to make to move forward.

Confirm shared-schema-with-RLS as the V1 model. (My recommendation: yes.)
Decide whether tenants table itself lives in the same database or a separate “control plane” database. Most teams put it in the same database. It’s fine. Make it owned by the migration role and inaccessible to the app role except through specific views.
Decide on tenant_id as UUID vs slug. UUID is the right answer — slugs change, UUIDs don’t, and you don’t want to rewrite foreign keys when a customer renames their site.
Decide who writes the RLS scaffolding. This is exactly the kind of thing Claude Code is great at — generating the migration files, the middleware, the test harness, the CI check. Have Seth or whoever’s on infra prototype this in week one of Phase 2 so the rest of the team is building on a solid base.

Want to go deeper on the cross-pollination isolation problem specifically (it’s the highest-risk piece), or shift to a different infra question — like the Cloud Run vs GKE decision, the CI/CD setup, or how you structure GCP projects (one project? prod/staging/dev? per-environment?)?

For: S Seth Shoultes C Cindy Thoennessen B Blair Williams S Santiago Perez Asis

← Previous in spec MemberIntel V2 Specification Next in spec → Cross-Pollination & Brain Isolation

Phase 1-2 Friction Points reference
A companion one-pager surfacing seven decisions where the May architectural commitments conflict with the v1 phased plan — each with a recommendation and named decider — designed to drive a 30–45 minute sign-off session before Rev 2 is approved.
Decision Rights Matrix reference
A binding contract defining who owns which decisions across engineering, product, compliance, and GTM — keeping Seth and Cindy unblocked as peers without escalating every disagreement to Blair.
Quarterly Architecture Review Template reference
A 90-minute fixed-agenda template for quarterly architectural health reviews — covering differentiation gap, cost-per-cohort, reliability, cross-pollination health, compliance posture, and a standing 'one thing that worried me' round — starting Q4 2026 post-GA.
Privacy Counsel Architecture Review Agenda reference
A 4-hour late-May working agenda for outside privacy counsel to review MemberIntel's per-tenant isolation, cross-pollination boundary, secrets management, and data lifecycle decisions — grounding counsel's June ToS and Privacy Policy drafting in the actual architecture.
Seth's Phase 1 Deliverable Checklist reference
Seth's operational working checklist for May 2026 — organized week-by-week with ADR drafts, GCP scaffolding, schema design, RLS prototype, hiring pipeline, and cross-functional coordination tasks required to unlock Phase 2 on June 1.
Seth — Lead Architect JD role
Seth Shoultes's Lead Architect role definition: end-to-end technical ownership of the brain, data pipeline, AI/ML architecture, engineering team, and vendor decisions for MemberIntel.

This page is part of

Cindy's Tour — Product Lead — Step 4 of 8
Seth's Tour — Lead Architect — Step 3 of 10