Cloud · GCP|13 min read|

GCP Architecture for Enterprises: Reference Guide

Google Cloud Platform has a deceptively easy learning curve. The first projects are quick to stand up, but a GCP architecture that is secure, multi-team, auditable, and cost-efficient requires design decisions from day one that are difficult to change later. This guide documents the architecture decisions we make in real enterprise GCP projects — the patterns that scale and the anti-patterns that create technical debt.

Resource Hierarchy: the foundation of everything else

The most common mistake in companies starting with GCP: creating projects directly under the organization without planning the hierarchy. This makes it impossible to apply consistent policies across multiple teams, manage billing by business unit, and delegate access without compromising security.

text
Organization: company.com
├── Folder: production/
│   ├── Project: prod-networking       # Shared VPC, Cloud DNS, NAT
│   ├── Project: prod-data             # BigQuery, Cloud SQL
│   └── Project: prod-apps            # GKE, Cloud Run, APIs
├── Folder: non-production/
│   ├── Project: staging-apps
│   └── Project: dev-sandbox
└── Folder: shared-services/
    ├── Project: monitoring            # Centralized Cloud Monitoring
    └── Project: security             # Security Command Center, KMS

Separating into distinct projects by function (networking, data, apps) creates natural blast radius boundaries. An IAM incident in the apps project doesn't compromise KMS keys in security. Billing by folder enables cost reporting per business unit without manual tagging.

Shared VPC: the correct network pattern for enterprises

With Shared VPC, a host project centralizes the network (subnets, firewall rules, Cloud NAT, VPC peering with on-premises) and service projects consume that network without managing their own network resources. This eliminates isolated VPC proliferation, simplifies connectivity, and centralizes traffic auditing.

hcl
# Terraform — enable Shared VPC on the host project
resource "google_compute_shared_vpc_host_project" "host" {
  project = var.host_project_id
}

resource "google_compute_shared_vpc_service_project" "apps" {
  host_project    = google_compute_shared_vpc_host_project.host.project
  service_project = var.apps_project_id
}

# Subnet with Private Google Access so VMs reach Google APIs without public IPs
resource "google_compute_subnetwork" "apps_subnet" {
  name                     = "apps-subnet"
  ip_cidr_range            = "10.10.0.0/24"
  region                   = "us-central1"
  network                  = google_compute_network.shared_vpc.id
  private_ip_google_access = true
}

GKE Autopilot vs. Standard: the right decision by use case

GKE Autopilot manages the data plane automatically — Google provisions and scales nodes, applies security patches, and optimizes pod packing. Pricing is per CPU/memory/storage consumed by pods, not by nodes. GKE Standard gives full control over node pools but requires operational management of the data plane.

  • Autopilot: recommended for most enterprise workloads. Eliminates node operations, guarantees automatic security compliance, and per-pod billing reduces waste.
  • Standard: necessary when you have specific hardware requirements (GPUs, local SSDs), need custom DaemonSets, or have workloads with highly variable resource profiles that Autopilot can't optimize.
  • In practice: most enterprises should start with Autopilot and migrate to Standard only when they hit concrete limitations.

Workload Identity: no credentials in code

The most dangerous anti-pattern in GCP: creating service account JSON keys and distributing them as environment variables or secrets. If that key leaks, the compromised access has no automatic expiration. Workload Identity lets GKE pods access GCP APIs with a service account identity, without static credentials.

yaml
# Kubernetes ServiceAccount with Workload Identity binding
apiVersion: v1
kind: ServiceAccount
metadata:
  name: api-service
  namespace: production
  annotations:
    iam.gke.io/gcp-service-account: api-service@prod-apps.iam.gserviceaccount.com
bash
# Bind the KSA to the GSA
gcloud iam service-accounts add-iam-policy-binding \
  api-service@prod-apps.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:prod-apps.svc.id.goog[production/api-service]"
With Workload Identity, pod credentials expire automatically every hour. No JSON keys to rotate manually, no leakage risk in repositories, and access is auditable per pod in Cloud Audit Logs.

Cloud Run for APIs without cluster management

For stateless HTTP APIs, Cloud Run is frequently better than GKE. No node management, no pod scheduling concerns, automatic scale-to-zero, and per-request pricing. The tradeoff: less control over the execution environment and cold starts for services with sporadic traffic.

yaml
# Cloud Run service with VPC connector and no direct public traffic
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: payment-api
  annotations:
    run.googleapis.com/ingress: internal-and-cloud-load-balancing
    run.googleapis.com/vpc-access-connector: projects/prod-networking/locations/us-central1/connectors/vpc-connector
    run.googleapis.com/vpc-access-egress: private-ranges-only
spec:
  template:
    spec:
      serviceAccountName: api-service@prod-apps.iam.gserviceaccount.com
      containers:
      - image: us-central1-docker.pkg.dev/prod-apps/api/payment-api:latest
        resources:
          limits:
            cpu: "2"
            memory: "1Gi"

Network security: VPC Service Controls

VPC Service Controls creates a security perimeter around GCP APIs (BigQuery, Cloud Storage, Cloud SQL) that prevents data exfiltration even if a service account is compromised. APIs inside the perimeter are only accessible from explicitly authorized networks and projects.

For enterprises handling regulated data (PCI-DSS, HIPAA, financial data), VPC Service Controls isn't optional — it's the control that prevents an IAM breach from becoming a data breach.

Frequently Asked Questions

GCP, AWS, or Azure for a company new to cloud?
The honest answer: it depends on where your talent and workloads are. GCP is the best choice if the technical team has experience with Google tools (BigQuery, Kubernetes, data), or if your core business is data analytics/ML. AWS has the largest ecosystem and talent pool. Azure is the natural choice if you already have Microsoft contracts (Office 365, Active Directory). For most companies in LATAM, AWS and GCP are equivalent in capability — the difference is in the team and long-term strategy.
How do I control GCP costs across multiple projects?
Budgets and alerts per project and folder are the basic control. For real visibility: export Cloud Billing to BigQuery and build dashboards in Looker Studio with cost per team, project, and service. Mandatory cost labels on all resources (team, environment, application) enable drill-down. GCP Recommender automatically identifies underutilized resources.
How do I migrate from on-premises to GCP without cutting existing services?
The standard strategy for zero-downtime migration: Cloud VPN or Dedicated Interconnect for hybrid connectivity, per-service migration using the Strangler Fig pattern, and a coexistence period where traffic is gradually split. GCP migration tools (Migrate to VMs, Database Migration Service) automate the heaviest lifting.
What is Cloud Armor and when do I need it?
Cloud Armor is GCP's WAF and DDoS protection, integrated with Cloud Load Balancing. Required for any internet-facing service handling user data or processing payments. Provides OWASP Top 10 protection, adaptive DDoS rules, and reCAPTCHA Enterprise integration for bot protection.
Does GKE Autopilot have relevant limitations for enterprise production?
The most relevant limitations: no DaemonSet support (replaced with Google-managed system nodes), no customizing node kernel or using hardware-specific drivers, and pods must define explicit resource requests. For 80% of enterprise workloads, these aren't restrictions — but if you have custom monitoring agents as DaemonSets or GPU workloads, GKE Standard is necessary.

Is your company evaluating GCP or needing to better structure its existing cloud architecture? Our team can do an assessment and propose a reference architecture.

Talk to our team

Related articles

IQS

Engineering Team — IQS

Software, cloud, and DevOps engineers with enterprise project experience.