Module structure that scales with the team
The most common mistake in companies adopting Terraform: monolithic modules that define all infrastructure in a single directory. This creates blast radius problems (one error affects all infra), long plan/apply times, and makes it difficult for multiple teams to work in parallel.
infrastructure/
├── modules/
│ ├── eks-cluster/ # reusable module: EKS cluster
│ ├── rds-postgres/ # reusable module: RDS database
│ ├── vpc-networking/ # reusable module: VPC, subnets, routing
│ └── monitoring-stack/ # reusable module: Prometheus + Grafana
├── environments/
│ ├── production/
│ │ ├── main.tf # invokes modules with production vars
│ │ ├── variables.tf
│ │ └── backend.tf
│ └── staging/
│ ├── main.tf
│ └── backend.tf
└── global/
├── iam/ # cross-environment roles and policies
└── dns/ # shared DNS zonesModules are code with a defined interface; environments are invocations of those modules with specific parameters. The platform team maintains the modules while product teams configure their environments independently without coupling.
Remote State: the most costly Terraform mistake in enterprises
Local state file is the beginner mistake with the most severe production consequences. In an enterprise, multiple people apply changes. Without remote state and locking, two simultaneous applys corrupt the state. If state is lost or corrupted, Terraform no longer knows which resources exist and may try to recreate them — in production, that can be a serious incident.
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "production/eks/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
kms_key_id = "arn:aws:kms:us-east-1:123456789:key/abc-123"
}
}DynamoDB provides the locking that prevents two simultaneous terraform apply runs. KMS encryption is mandatory for state files that contain sensitive outputs (RDS passwords, API tokens). In GCP, the equivalent is a GCS bucket with versioning and uniform bucket-level access enabled.
Terragrunt: DRY configuration across environments
Native Terraform has a repetition problem across environments: staging and production code is practically identical except for a few values. Terragrunt adds a configuration layer that lets you define backend, providers, and common configuration exactly once.
# root terragrunt.hcl — shared configuration for all environments
remote_state {
backend = "s3"
config = {
bucket = "company-terraform-state"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
}Each environment has only a minimal terragrunt.hcl with environment-specific values. 80% of configuration lives in the root and is automatically inherited. This eliminates the duplication that makes backend configuration changes require modifying 4-5 files.
CI/CD Pipeline for Terraform: plan as artifact
The correct CI/CD flow for enterprise Terraform: terraform plan on each PR with changes commented directly in the PR, team review and approval, and terraform apply (automatic or manual with explicit approval) on merge. The plan file must be saved as a pipeline artifact and used in the apply — never generate a new plan — to guarantee that what was approved is exactly what gets applied.
# GitLab CI — Terraform pipeline with plan as artifact
.terraform_base:
image: hashicorp/terraform:1.9
before_script:
- terraform init -backend-config="key=${TF_STATE_KEY}"
plan:
extends: .terraform_base
script:
- terraform plan -out=tfplan.binary -no-color 2>&1 | tee plan.txt
artifacts:
paths: [tfplan.binary, plan.txt]
apply:
extends: .terraform_base
script:
- terraform apply -auto-approve tfplan.binary
needs: [plan]
when: manual
environment: productionPolicy as Code: preventing mistakes before apply
In enterprise infrastructure, not all Terraform changes should be able to apply without additional validation. Open Policy Agent with Conftest lets you define policies that run against the Terraform plan before apply.
- Block S3/GCS buckets without encryption enabled
- Block security groups with 0.0.0.0/0 on administrative ports (22, 3389)
- Verify all resources have cost, team, and environment tags
- Enforce instance type restrictions per environment (no t3.micro in production)
- Validate RDS instances are not publicly accessible
# OPA policy — verify S3 encryption
package terraform.aws
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
not resource.change.after.server_side_encryption_configuration
msg := sprintf(
"S3 bucket '%v' requires encryption to be configured",
[resource.address]
)
}Workspaces vs. Separate Directories
Terraform workspaces are useful for ephemeral environments (test environments that get created and destroyed frequently). For production and staging with significant structural differences, separate directories are more explicit and safer — they eliminate the risk of applying to the wrong workspace.
Warning sign: if anyone on your team has applied production changes thinking they were in staging (due to a workspace mix-up), it's time to migrate to separate directories.
Frequently Asked Questions
Terraform or Pulumi for an enterprise new to IaC?
How do I import resources that already exist and weren't created with Terraform?
How long does it take to implement Terraform in a company managing infra manually?
Is Terragrunt necessary, or can you do it with pure Terraform?
How do I manage secrets in Terraform (database passwords, API keys)?
Is your team managing cloud infrastructure with scripts or ClickOps? We can help you migrate to Terraform with the right process, without disrupting current operations.
Talk to our team