IQS | Terraform for Enterprise Infrastructure

Module structure that scales with the team

The most common mistake in companies adopting Terraform: monolithic modules that define all infrastructure in a single directory. This creates blast radius problems (one error affects all infra), long plan/apply times, and makes it difficult for multiple teams to work in parallel.

text

infrastructure/
├── modules/
│   ├── eks-cluster/        # reusable module: EKS cluster
│   ├── rds-postgres/       # reusable module: RDS database
│   ├── vpc-networking/     # reusable module: VPC, subnets, routing
│   └── monitoring-stack/   # reusable module: Prometheus + Grafana
├── environments/
│   ├── production/
│   │   ├── main.tf         # invokes modules with production vars
│   │   ├── variables.tf
│   │   └── backend.tf
│   └── staging/
│       ├── main.tf
│       └── backend.tf
└── global/
    ├── iam/                # cross-environment roles and policies
    └── dns/                # shared DNS zones

Modules are code with a defined interface; environments are invocations of those modules with specific parameters. The platform team maintains the modules while product teams configure their environments independently without coupling.

Remote State: the most costly Terraform mistake in enterprises

Local state file is the beginner mistake with the most severe production consequences. In an enterprise, multiple people apply changes. Without remote state and locking, two simultaneous applys corrupt the state. If state is lost or corrupted, Terraform no longer knows which resources exist and may try to recreate them — in production, that can be a serious incident.

hcl

terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "production/eks/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    kms_key_id     = "arn:aws:kms:us-east-1:123456789:key/abc-123"
  }
}

DynamoDB provides the locking that prevents two simultaneous terraform apply runs. KMS encryption is mandatory for state files that contain sensitive outputs (RDS passwords, API tokens). In GCP, the equivalent is a GCS bucket with versioning and uniform bucket-level access enabled.

Terragrunt: DRY configuration across environments

Native Terraform has a repetition problem across environments: staging and production code is practically identical except for a few values. Terragrunt adds a configuration layer that lets you define backend, providers, and common configuration exactly once.

hcl

# root terragrunt.hcl — shared configuration for all environments
remote_state {
  backend = "s3"
  config = {
    bucket         = "company-terraform-state"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
}

Each environment has only a minimal terragrunt.hcl with environment-specific values. 80% of configuration lives in the root and is automatically inherited. This eliminates the duplication that makes backend configuration changes require modifying 4-5 files.

CI/CD Pipeline for Terraform: plan as artifact

The correct CI/CD flow for enterprise Terraform: terraform plan on each PR with changes commented directly in the PR, team review and approval, and terraform apply (automatic or manual with explicit approval) on merge. The plan file must be saved as a pipeline artifact and used in the apply — never generate a new plan — to guarantee that what was approved is exactly what gets applied.

yaml

# GitLab CI — Terraform pipeline with plan as artifact
.terraform_base:
  image: hashicorp/terraform:1.9
  before_script:
    - terraform init -backend-config="key=${TF_STATE_KEY}"

plan:
  extends: .terraform_base
  script:
    - terraform plan -out=tfplan.binary -no-color 2>&1 | tee plan.txt
  artifacts:
    paths: [tfplan.binary, plan.txt]

apply:
  extends: .terraform_base
  script:
    - terraform apply -auto-approve tfplan.binary
  needs: [plan]
  when: manual
  environment: production

The apply must always use the binary generated in the plan stage, never generate a new plan. If apply generates a new plan, what the team approved and what actually gets applied may differ.

Policy as Code: preventing mistakes before apply

In enterprise infrastructure, not all Terraform changes should be able to apply without additional validation. Open Policy Agent with Conftest lets you define policies that run against the Terraform plan before apply.

Block S3/GCS buckets without encryption enabled
Block security groups with 0.0.0.0/0 on administrative ports (22, 3389)
Verify all resources have cost, team, and environment tags
Enforce instance type restrictions per environment (no t3.micro in production)
Validate RDS instances are not publicly accessible

rego

# OPA policy — verify S3 encryption
package terraform.aws

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket"
  not resource.change.after.server_side_encryption_configuration
  msg := sprintf(
    "S3 bucket '%v' requires encryption to be configured",
    [resource.address]
  )
}

Workspaces vs. Separate Directories

Terraform workspaces are useful for ephemeral environments (test environments that get created and destroyed frequently). For production and staging with significant structural differences, separate directories are more explicit and safer — they eliminate the risk of applying to the wrong workspace.

Warning sign: if anyone on your team has applied production changes thinking they were in staging (due to a workspace mix-up), it's time to migrate to separate directories.

Frequently Asked Questions

Terraform or Pulumi for an enterprise new to IaC?

Terraform if the team includes operations engineers who aren't professional developers — HCL is more accessible for Ops profiles. Pulumi if the platform team consists mainly of software engineers who prefer Python, TypeScript, or Go. For most enterprises, Terraform/OpenTofu is the more pragmatic choice for its module ecosystem, documentation, and wider talent availability.

How do I import resources that already exist and weren't created with Terraform?

With terraform import. The process: write the resource definition in HCL, run terraform import aws_instance.app i-1234567890abcdef0 to associate the existing resource with state, then terraform plan to verify no unintended changes. With Terraform 1.5+, the import block in HCL lets you do this declaratively as part of your code.

How long does it take to implement Terraform in a company managing infra manually?

The longest process isn't writing the code — it's importing existing infrastructure and establishing the team workflow. In real projects, the first critical module (EKS cluster or RDS database) with a working pipeline takes 2-3 weeks. Full infrastructure coverage is a 3-6 month process depending on complexity.

Is Terragrunt necessary, or can you do it with pure Terraform?

Terragrunt isn't necessary for small teams or environments with little variation. It becomes valuable when you have 3+ environments with repeated backend configuration, or when duplication between environments becomes a real operational problem. The rule: start with pure Terraform and add Terragrunt when the pain of repetition is concrete.

How do I manage secrets in Terraform (database passwords, API keys)?

Never commit secret values in Terraform code. Options: (1) TF_VAR_name environment variables for values passed at runtime, (2) integration with AWS Secrets Manager or HashiCorp Vault via data sources that read values at plan/apply time, (3) sensitive outputs marked as sensitive = true so Terraform doesn't display them in logs.

Is your team managing cloud infrastructure with scripts or ClickOps? We can help you migrate to Terraform with the right process, without disrupting current operations.

Talk to our team

AI · RAG

Terraform for Enterprise Infrastructure

Module structure that scales with the team

Remote State: the most costly Terraform mistake in enterprises

Terragrunt: DRY configuration across environments

CI/CD Pipeline for Terraform: plan as artifact

Policy as Code: preventing mistakes before apply

Workspaces vs. Separate Directories

Frequently Asked Questions

Related articles

How to Build a RAG System: AI Over Your Own Data

Platform Engineering: How to Build an Internal Developer Platform (IDP)

AWS vs Azure vs GCP in the Dominican Republic: Costs, Capabilities, and Which to Choose