Table of contents
- Why Environment Separation Is Hard
- terraform workspace
- Limitations of Workspaces
- Directory-Based Environment Separation
- A Brief Introduction to Terragrunt
- Strategy Comparison
- Combining with CI/CD
Why Environment Separation Is Hard
“Take the infrastructure deployed in dev and create the same thing in prod.” Sounds simple, but in practice it’s tricky. Prod has different sizing, different backup policies, and different access permissions. It’s not about creating the exact same thing twice — it’s about reproducing the same pattern with slightly different parameters.
Separating environments also means fully separating state. A mistake in dev must not affect prod. If they share the same state, one side’s work can break the other.
flowchart TB
subgraph "Environment Separation Requirements"
direction TB
Sep1["Complete state separation"]
Sep2["Allow variable/config differences"]
Sep3["Access control separation"]
Sep4["Differentiated approval processes"]
Sep5["Minimize code duplication"]
end
Terraform offers two solutions: workspaces and directory-based separation. Terragrunt exists as a complementary tool. Let’s look at all three in order.
terraform workspace
By default, Terraform operates in a workspace called default. Switching workspaces separates the state files.
# Check current workspace
terraform workspace show
# default
# Create new workspaces
terraform workspace new dev
terraform workspace new prod
# Switch workspace
terraform workspace select dev
# List
terraform workspace list
# default
# * dev
# prod
When using a remote backend (like S3), state file paths are automatically separated per workspace. For example, if your backend configuration is:
terraform {
backend "s3" {
bucket = "my-tfstate"
key = "app/terraform.tfstate"
region = "ap-northeast-2"
}
}
The actual S3 paths look like this.
s3://my-tfstate/
├── app/terraform.tfstate # default
└── env:/
├── dev/app/terraform.tfstate # dev workspace
└── prod/app/terraform.tfstate # prod workspace
In code, you can reference the current workspace name via the terraform.workspace variable.
locals {
environment = terraform.workspace
instance_count = {
dev = 1
staging = 2
prod = 5
}[terraform.workspace]
}
resource "aws_instance" "app" {
count = local.instance_count
instance_type = local.environment == "prod" ? "m5.large" : "t3.micro"
tags = {
Environment = local.environment
}
}
At first glance, it looks clean. One set of code, just switch workspaces to get multiple environments. But in practice, this can become a trap.
Limitations of Workspaces
Even the official HashiCorp documentation recommends workspaces only for “lightweight separation for slightly different environments.” It implies they’re unsuitable for production separation. Here’s why.
1) Shared code means shared risk
You made a bad code change in the dev workspace. But this is code that prod also uses. It’s not a structure where you verify in dev and then move to prod — it’s a structure where you run the same code with different workspaces. It’s hard to add verification steps specifically for production.
2) Configuration files aren’t separated
Workspaces separate state, but share code. You use terraform.workspace to branch environment-specific settings, but as branches multiply, code becomes complex and readability drops.
resource "aws_db_instance" "db" {
instance_class = {
dev = "db.t3.micro"
staging = "db.t3.medium"
prod = "db.r5.xlarge"
}[terraform.workspace]
allocated_storage = {
dev = 20
staging = 50
prod = 500
}[terraform.workspace]
backup_retention_period = {
dev = 1
staging = 7
prod = 30
}[terraform.workspace]
# ... workspace branching for dozens of attributes
}
Even this much is already unpleasant to read. You could extract to variable files, but then it’s not much different from directory separation.
3) The backend itself can’t be separated
Workspaces only differ in key paths within the same backend. If you want to put the dev state in a different AWS account’s S3 bucket, workspaces can’t do that. For fully separated backends per environment, you need a different approach.
4) Error-prone
The current workspace isn’t shown in the prompt. If you ran terraform apply while actually in the prod workspace? Without automation, mistakes are easy to make.
Summary: Workspaces are suitable for short-lived experimental sandboxes within the same team or per-feature preview environments. They’re unsuitable for long-term dev/staging/prod separation.
Directory-Based Environment Separation
This is the far more common approach in practice. Environments are completely separated by directory.
infra/
├── modules/ # Reusable modules
│ ├── vpc/
│ ├── eks/
│ └── rds/
└── envs/
├── dev/
│ ├── main.tf # Module composition
│ ├── variables.tf
│ ├── terraform.tfvars
│ └── backend.tf # Dev state backend
├── staging/
│ ├── main.tf
│ ├── variables.tf
│ ├── terraform.tfvars
│ └── backend.tf # Staging state backend
└── prod/
├── main.tf
├── variables.tf
├── terraform.tfvars
└── backend.tf # Prod state backend
Each environment is an independent directory with an independent backend. Modules are shared, but per-environment settings live within each directory.
flowchart TB
Modules["modules/\n(Reusable units)"]
Dev["envs/dev/"]
Staging["envs/staging/"]
Prod["envs/prod/"]
Modules --> Dev
Modules --> Staging
Modules --> Prod
Dev -->|"terraform init/apply\n(from dev directory)"| DevState["dev state\n(dev S3 bucket)"]
Staging -->|"terraform init/apply\n(from staging directory)"| StagingState["staging state\n(staging S3 bucket)"]
Prod -->|"terraform init/apply\n(from prod directory)"| ProdState["prod state\n(prod S3 bucket)"]
Each environment’s main.tf composes modules like this.
# envs/prod/main.tf
module "vpc" {
source = "../../modules/vpc"
cidr_block = "10.0.0.0/16"
environment = "prod"
azs = ["ap-northeast-2a", "ap-northeast-2c", "ap-northeast-2d"]
}
module "eks" {
source = "../../modules/eks"
cluster_name = "prod-eks"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
node_groups = {
default = {
instance_type = "m5.large"
min_size = 3
max_size = 10
desired_size = 5
}
}
}
# envs/dev/main.tf
module "vpc" {
source = "../../modules/vpc"
cidr_block = "10.10.0.0/16" # Different CIDR from prod
environment = "dev"
azs = ["ap-northeast-2a", "ap-northeast-2c"]
}
module "eks" {
source = "../../modules/eks"
cluster_name = "dev-eks"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
node_groups = {
default = {
instance_type = "t3.medium" # Smaller
min_size = 1
max_size = 3
desired_size = 1
}
}
}
Same modules, but different parameters per environment. Work is done from within each directory.
cd envs/dev
terraform init
terraform apply
# Switch to prod
cd ../prod
terraform init
terraform apply
Advantages
- Completely independent state/backend per environment
- Can use different AWS accounts per environment
- Easy to enforce special approval processes for prod only
- Low probability of accidentally applying to the wrong environment
Disadvantages
- Code duplication (each environment’s
main.tfis similar) - Backend settings get hardcoded per environment
- Need to run
terraform initseparately for each environment
Most production teams use this approach. The duplication issue is partially mitigated by modularization, but not perfectly. Terragrunt fills this gap.
A Brief Introduction to Terragrunt
Terragrunt is a Terraform wrapper made by Gruntwork. Its core philosophy is “use Terraform the DRY (Don’t Repeat Yourself) way.”
Terragrunt’s key idea is to declare repeated elements across environment directories (backend settings, providers, common variables) once and inherit them.
Here’s a typical Terragrunt project structure.
infra/
├── terragrunt.hcl # Root config (global backend, provider)
├── _envcommon/ # Common component definitions
│ ├── vpc.hcl
│ └── eks.hcl
└── envs/
├── dev/
│ ├── env.hcl # Dev global variables
│ ├── vpc/
│ │ └── terragrunt.hcl # Inherits _envcommon/vpc.hcl
│ └── eks/
│ └── terragrunt.hcl
└── prod/
├── env.hcl
├── vpc/
│ └── terragrunt.hcl
└── eks/
└── terragrunt.hcl
Declare the backend once in the root terragrunt.hcl
# infra/terragrunt.hcl
remote_state {
backend = "s3"
config = {
bucket = "my-company-tfstate"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "ap-northeast-2"
encrypt = true
dynamodb_table = "my-company-tflock"
}
generate = {
path = "backend.tf"
if_exists = "overwrite"
}
}
${path_relative_to_include()} is the key. The relative path of each subdirectory automatically becomes the key. Running from envs/dev/vpc automatically sets the key to envs/dev/vpc/terraform.tfstate. Per-environment, per-component state separation comes for free.
Environment settings declared once
# envs/dev/env.hcl
locals {
environment = "dev"
aws_region = "ap-northeast-2"
account_id = "111122223333"
}
Actual component declaration
# envs/dev/vpc/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
include "envcommon" {
path = "${get_terragrunt_dir()}/../../../_envcommon/vpc.hcl"
}
locals {
env = read_terragrunt_config(find_in_parent_folders("env.hcl")).locals
}
terraform {
source = "../../../modules/vpc"
}
inputs = {
cidr_block = "10.10.0.0/16"
environment = local.env.environment
}
The production VPC’s terragrunt.hcl is nearly identical, with only the CIDR in inputs being different.
Another powerful feature Terragrunt provides is automatic dependency management.
# envs/dev/eks/terragrunt.hcl
dependency "vpc" {
config_path = "../vpc"
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids = dependency.vpc.outputs.private_subnet_ids
}
The dependency block automatically retrieves outputs from another Terragrunt project. No need to manually copy VPC outputs or fetch them via data sources.
The run-all command
# Apply all sub-projects in dependency order
terragrunt run-all apply
# A specific environment only
cd envs/dev
terragrunt run-all apply
Terragrunt calculates the DAG and auto-applies in VPC → EKS → app order.
When to adopt Terragrunt
- If there are 2 or fewer environments and things are simple, pure Terraform is sufficient
- If there are 3+ environments or components exceed 10, consider Terragrunt
- If using a multi-account structure (different AWS account per environment), Terragrunt has a clear advantage
- Factor in the learning curve for the entire team
Strategy Comparison
Here’s a summary of the three strategies.
| Factor | workspace | Directory Separation | Terragrunt |
|---|---|---|---|
| State separation | Path separation within same backend | Fully independent backends | Fully independent backends |
| Multi-AWS account | Difficult | Possible | Possible, easy |
| Code duplication | Low | High (proportional to environments) | Low |
| Backend config | Once | Repeated per environment | Once |
| Learning curve | Low | Low | Medium |
| Dependency automation | None | None | Built-in |
| Large-scale suitability | Low | Medium | High |
Practical recommendations
- Solo or 2-3 person small team: Start with pure Terraform + directory separation
- Mid-size team, 2-3 environments, 5-10 components: Directory separation + modularization
- Large team or multi-account: Consider Terragrunt adoption
Whichever approach you choose, prod state must always be in a completely separate backend. Separating prod with workspaces is risky in the long term.
Combining with CI/CD
Environment separation is tightly coupled with CI/CD. A typical pipeline looks like this.
flowchart LR
PR["PR Created"] --> Plan["terraform plan\n(all environments)"]
Plan --> Review["Review & Approval"]
Review --> Merge["Merge to main"]
Merge --> Dev["dev auto-apply"]
Dev --> Stg["staging auto-apply"]
Stg --> Manual["prod manual approval"]
Manual --> Prod["prod apply"]
At PR time, plan results for all environments are posted as comments. After merge, dev and staging auto-apply, while prod goes through a manual approval step. With directory-based separation, you run Terraform individually from each environment directory; with Terragrunt, run-all handles it all at once.
This pipeline is covered in detail in the next part (CI/CD integration).
Environment separation is a balancing point where “minimize code duplication” and “state independence” pull in opposite directions. Choose the right point based on team size and environment complexity. What’s certain is that prod must always have independent state.
In the next part, we’ll cover Kubernetes and Helm providers. We’ll also look at the criteria for deciding whether to manage cluster internals with Terraform or delegate to ArgoCD.


Loading comments...