Cloud · Terraform|13 min read|

Terraform for Enterprise Infrastructure

Terraform (and its fork OpenTofu) is the de facto standard for Infrastructure as Code in enterprises. Not because it's perfect, but because it solves the real problems of teams managing infrastructure at scale: shared state, reusable modules, plan before apply, and genuine multi-cloud support. The transition from bash scripts or ClickOps to Terraform isn't purely technical — it fundamentally changes how the team works: changes are proposed, reviewed, and approved like code.

Module structure that scales with the team

The most common mistake in companies adopting Terraform: monolithic modules that define all infrastructure in a single directory. This creates blast radius problems (one error affects all infra), long plan/apply times, and makes it difficult for multiple teams to work in parallel.

text
infrastructure/
├── modules/
│   ├── eks-cluster/        # reusable module: EKS cluster
│   ├── rds-postgres/       # reusable module: RDS database
│   ├── vpc-networking/     # reusable module: VPC, subnets, routing
│   └── monitoring-stack/   # reusable module: Prometheus + Grafana
├── environments/
│   ├── production/
│   │   ├── main.tf         # invokes modules with production vars
│   │   ├── variables.tf
│   │   └── backend.tf
│   └── staging/
│       ├── main.tf
│       └── backend.tf
└── global/
    ├── iam/                # cross-environment roles and policies
    └── dns/                # shared DNS zones

Modules are code with a defined interface; environments are invocations of those modules with specific parameters. The platform team maintains the modules while product teams configure their environments independently without coupling.

Remote State: the most costly Terraform mistake in enterprises

Local state file is the beginner mistake with the most severe production consequences. In an enterprise, multiple people apply changes. Without remote state and locking, two simultaneous applys corrupt the state. If state is lost or corrupted, Terraform no longer knows which resources exist and may try to recreate them — in production, that can be a serious incident.

hcl
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "production/eks/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    kms_key_id     = "arn:aws:kms:us-east-1:123456789:key/abc-123"
  }
}

DynamoDB provides the locking that prevents two simultaneous terraform apply runs. KMS encryption is mandatory for state files that contain sensitive outputs (RDS passwords, API tokens). In GCP, the equivalent is a GCS bucket with versioning and uniform bucket-level access enabled.

Terragrunt: DRY configuration across environments

Native Terraform has a repetition problem across environments: staging and production code is practically identical except for a few values. Terragrunt adds a configuration layer that lets you define backend, providers, and common configuration exactly once.

hcl
# root terragrunt.hcl — shared configuration for all environments
remote_state {
  backend = "s3"
  config = {
    bucket         = "company-terraform-state"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
}

Each environment has only a minimal terragrunt.hcl with environment-specific values. 80% of configuration lives in the root and is automatically inherited. This eliminates the duplication that makes backend configuration changes require modifying 4-5 files.

CI/CD Pipeline for Terraform: plan as artifact

The correct CI/CD flow for enterprise Terraform: terraform plan on each PR with changes commented directly in the PR, team review and approval, and terraform apply (automatic or manual with explicit approval) on merge. The plan file must be saved as a pipeline artifact and used in the apply — never generate a new plan — to guarantee that what was approved is exactly what gets applied.

yaml
# GitLab CI — Terraform pipeline with plan as artifact
.terraform_base:
  image: hashicorp/terraform:1.9
  before_script:
    - terraform init -backend-config="key=${TF_STATE_KEY}"

plan:
  extends: .terraform_base
  script:
    - terraform plan -out=tfplan.binary -no-color 2>&1 | tee plan.txt
  artifacts:
    paths: [tfplan.binary, plan.txt]

apply:
  extends: .terraform_base
  script:
    - terraform apply -auto-approve tfplan.binary
  needs: [plan]
  when: manual
  environment: production
The apply must always use the binary generated in the plan stage, never generate a new plan. If apply generates a new plan, what the team approved and what actually gets applied may differ.

Policy as Code: preventing mistakes before apply

In enterprise infrastructure, not all Terraform changes should be able to apply without additional validation. Open Policy Agent with Conftest lets you define policies that run against the Terraform plan before apply.

  • Block S3/GCS buckets without encryption enabled
  • Block security groups with 0.0.0.0/0 on administrative ports (22, 3389)
  • Verify all resources have cost, team, and environment tags
  • Enforce instance type restrictions per environment (no t3.micro in production)
  • Validate RDS instances are not publicly accessible
rego
# OPA policy — verify S3 encryption
package terraform.aws

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket"
  not resource.change.after.server_side_encryption_configuration
  msg := sprintf(
    "S3 bucket '%v' requires encryption to be configured",
    [resource.address]
  )
}

Workspaces vs. Separate Directories

Terraform workspaces are useful for ephemeral environments (test environments that get created and destroyed frequently). For production and staging with significant structural differences, separate directories are more explicit and safer — they eliminate the risk of applying to the wrong workspace.

Warning sign: if anyone on your team has applied production changes thinking they were in staging (due to a workspace mix-up), it's time to migrate to separate directories.

Frequently Asked Questions

Terraform or Pulumi for an enterprise new to IaC?
Terraform if the team includes operations engineers who aren't professional developers — HCL is more accessible for Ops profiles. Pulumi if the platform team consists mainly of software engineers who prefer Python, TypeScript, or Go. For most enterprises, Terraform/OpenTofu is the more pragmatic choice for its module ecosystem, documentation, and wider talent availability.
How do I import resources that already exist and weren't created with Terraform?
With terraform import. The process: write the resource definition in HCL, run terraform import aws_instance.app i-1234567890abcdef0 to associate the existing resource with state, then terraform plan to verify no unintended changes. With Terraform 1.5+, the import block in HCL lets you do this declaratively as part of your code.
How long does it take to implement Terraform in a company managing infra manually?
The longest process isn't writing the code — it's importing existing infrastructure and establishing the team workflow. In real projects, the first critical module (EKS cluster or RDS database) with a working pipeline takes 2-3 weeks. Full infrastructure coverage is a 3-6 month process depending on complexity.
Is Terragrunt necessary, or can you do it with pure Terraform?
Terragrunt isn't necessary for small teams or environments with little variation. It becomes valuable when you have 3+ environments with repeated backend configuration, or when duplication between environments becomes a real operational problem. The rule: start with pure Terraform and add Terragrunt when the pain of repetition is concrete.
How do I manage secrets in Terraform (database passwords, API keys)?
Never commit secret values in Terraform code. Options: (1) TF_VAR_name environment variables for values passed at runtime, (2) integration with AWS Secrets Manager or HashiCorp Vault via data sources that read values at plan/apply time, (3) sensitive outputs marked as sensitive = true so Terraform doesn't display them in logs.

Is your team managing cloud infrastructure with scripts or ClickOps? We can help you migrate to Terraform with the right process, without disrupting current operations.

Talk to our team

Related articles

IQS

Engineering Team — IQS

Software, cloud, and DevOps engineers with enterprise project experience.