decision

ADR-0013: GCP Project Structure — Project-per-Environment + Shared Tooling

ADR-0013 (Proposed (V1 ships with single project; four-project structure targeted for V1.5), 2026-05-14): GCP Project Structure — Project-per-Environment + Shared Tooling.

▶ Watch the 1:25 summary — ADR-0013 — GCP Project Structure — Project-per-Environment + Shared Tooling, explained

Status: Proposed (V1 ships with single project; four-project structure targeted for V1.5)
Date: 2026-05-14
Deciders: Seth (Lead Architect)

Context

The spec and architecture deep-dive call for four GCP projects under a Google Cloud Organization: memberintel-prod, memberintel-staging, memberintel-dev, and memberintel-shared. This is the standard pattern for serious SaaS — each environment gets its own billing line, IAM boundary, VPC, and blast radius. The production folder enforces Organization Policies (no public IPs, no service account key creation) that don’t apply to non-production.

Current state: A single GCP project, memberintel-v1, holds all staging resources. Terraform in infra/ manages Cloud SQL (memberintel-db-staging), Cloud Run (memberintel-api-staging), a VPC connector, Artifact Registry, eight Secret Manager secrets, and domain mapping. Terraform state lives in gs://memberintel-terraform-state. There is no Google Cloud Organization, no folder hierarchy, and no Organization Policies in effect.

This single-project setup was the right call for getting staging live fast. It cannot become production. The spec is explicit: prod resources must live in a separate project with org-policy-enforced constraints, and no production data may ever be copied to a non-production environment.

Forces at play:

Isolation: A misconfigured staging deploy must never touch production data. Project boundaries are the strongest GCP isolation primitive.
Audit readiness: Privacy counsel (Allen, late-May architecture review) will ask whether production data can leak to lower environments. Project separation is the answer.
Egress control: Anthropic and Stripe require IP allowlisting. Cloud NAT with a static egress IP is per-project; shared random IPs don’t work.
Org policies: Constraints like constraints/compute.vmExternalIpAccess (no public IPs on data services) and constraints/iam.disableServiceAccountKeyCreation belong on the production folder, not replicated per resource.
Cost of migration: Splitting a single project into four requires creating the org, migrating state, and re-provisioning resources. Deferring this to V1.5 is acceptable only because V1 ships staging-only and has no production data yet.

Decision

Target architecture: four projects in a Google Cloud Organization.

Google Cloud Organization
├── production/
│   └── memberintel-prod        # Cloud Run, Cloud SQL, VPC, Secret Manager, Cloud Tasks
├── non-production/
│   ├── memberintel-staging     # Mirror of prod, smaller tier, synthetic data only
│   └── memberintel-dev         # Per-engineer iteration, loose IAM
└── (org-level)
    └── memberintel-shared      # Artifact Registry, Terraform state (GCS), KMS keyrings, DNS zones

Per-project responsibilities:

memberintel-prod — The only project that holds real customer data. Access is restricted to Seth, the Senior AI Engineer, and a deploy service account. No standing write access for others. Short-lived elevation for debugging only.
memberintel-staging — Structural mirror of prod (Cloud Run, Cloud SQL, VPC, Cloud NAT), smaller instance tiers, synthetic data only. Never a copy of production. The moment you copy prod data to staging “just to debug something,” your isolation story collapses and your privacy counsel’s signoff evaporates. Hard rule.
memberintel-dev — Looser IAM. Engineers can create their own Cloud Run revisions, spin up temporary Cloud SQL instances, and iterate freely. No real data, lower cost thresholds.
memberintel-shared — Artifact Registry (Docker images promote across environments from a single registry), Cloud Build triggers, the GCS bucket for Terraform state, KMS keyrings for all environments, DNS zones. This avoids duplicating CI/CD infrastructure three times.

Per-environment VPCs with private IP ranges:

Each environment project has its own VPC. Cloud SQL instances use private IP only (no public IP). Cloud Run connects to the VPC via a Serverless VPC Access Connector. Egress to Anthropic and Stripe routes through Cloud NAT with a static external IP — both vendors allow IP-based allowlisting, and privacy counsel appreciates auditable egress rather than random Google IPs.

Organization Policies on the production folder:

constraints/compute.vmExternalIpAccess — forbid public IPs on any resource in the production project. Data services are private-IP-only, full stop.
constraints/iam.disableServiceAccountKeyCreation — no JSON key downloads. Workload Identity Federation for CI/CD, not service account keys.
Required labels on all resources (environment, team, cost-center) enforced via constraints/storage.requiredStorageClass and similar org-policy mechanisms.

V1.5 migration path:

V1 ships with the current single memberintel-v1 project. When production readiness is on the horizon (V1.5), the migration is:

Create the Google Cloud Organization and folder structure.
Provision the four projects via Terraform (new modules, not refactored infra/).
Promote memberintel-v1 resources to memberintel-staging by importing state into the new project’s Terraform (or by recreating and cutover).
Stand up memberintel-prod empty, apply org policies to the production folder, deploy the production Cloud Run + Cloud SQL + VPC + Cloud NAT stack.
Point DNS for production traffic at memberintel-prod and staging traffic at memberintel-staging.

The current infra/ directory’s Terraform state (gs://memberintel-terraform-state) will become the staging state file. New Terraform modules will manage prod, dev, and shared in separate state files per the spec’s “one state file per environment” convention.

Consequences

Positive:

Strongest available GCP isolation: a staging bug or misconfiguration cannot reach production data.
Org policies on the production folder are enforced by Google, not by convention — no one can accidentally create a public IP or download a service account key in memberintel-prod.
Cloud NAT static egress IPs satisfy Anthropic and Stripe IP allowlisting and give privacy counsel an auditable egress story.
memberintel-shared centralizes tooling (Artifact Registry, KMS, Terraform state) and avoids triple duplication.
Docker images are built once in shared and promoted to each environment — same image, different config. Rollback by SHA is straightforward.
Per-project billing makes cost attribution trivial.

Negative / costs:

Four projects means four sets of IAM policies, four VPCs, four Cloud NAT gateways, four Cloud SQL instances. Minimum viable GCP spend is higher than a single project.
Google Cloud Organization setup requires a domain-verified Cloud Identity account. If Caseproof doesn’t already have one, this is a prerequisite that can take days.
Migrating from the current single project to the four-project structure requires state migration and a coordinated cutover. Risk of downtime during DNS propagation.
Terraform modules become more complex: per-environment variable files, per-environment state backends, cross-project references (shared Artifact Registry accessed by prod/staging/dev service accounts).
The dev environment’s looser IAM is a tradeoff: engineers need freedom to iterate, but permissive dev projects accumulate resources and cost if not watched.

Mitigations:

V1 ships staging-only in the single project. The four-project migration is a V1.5 milestone with a dedicated Terraform sprint. No rush.
Google Cloud Identity free tier is sufficient for the org. Domain verification for caseproofagent.com is already in place (Cloudflare DNS is configured).
Terraform state migration uses terraform state mv and terraform import — no resource recreation required for the staging project. Prod and dev are greenfield.
Cloud NAT cost is minimal (per-GiB egress, and the volume is low). Each static IP is ~$7/month.
Dev project gets budget alerts at $50 and $100. Per-engineer spending limits via IAM conditions prevent runaway costs.
Org policies are folder-level, not per-resource, so they’re written once and inherited.

Alternatives considered

Single project with named environments (current state) — rejected for production use: no project-boundary isolation, no org-policy enforcement, can’t satisfy privacy counsel’s requirement that prod data never reaches a lower environment.
Two projects (prod + non-prod) — rejected: staging and dev have different IAM profiles and cost tolerances. Collapsing them into one project means either staging is too locked down (engineers can’t iterate) or dev is too loose (staging isn’t reliable enough for pre-prod validation).
Three projects (prod + staging + dev, no shared) — viable but inferior: Artifact Registry, KMS, and Terraform state must live somewhere. Without a shared project, they live in one of the three environment projects, creating a cross-dependency and making it harder to enforce “prod project has no CI/CD infra, only runtime resources.”
Project-per-tenant — rejected for V1: massive overkill for the current customer count. Worth knowing it exists in case an enterprise deal post-V2 demands dedicated project isolation, but building for it now violates YAGNI.

For: S Seth Shoultes A AI Engineer B Blair Williams S Santiago Perez Asis P Product Lead