Terraform Part 11 — Workspaces and Environment Separation

Why Environment Separation Is Hard
terraform workspace
Limitations of Workspaces
Directory-Based Environment Separation
A Brief Introduction to Terragrunt
Strategy Comparison
Combining with CI/CD

Why Environment Separation Is Hard

“Take the infrastructure deployed in dev and create the same thing in prod.” Sounds simple, but in practice it’s tricky. Prod has different sizing, different backup policies, and different access permissions. It’s not about creating the exact same thing twice — it’s about reproducing the same pattern with slightly different parameters.

Separating environments also means fully separating state. A mistake in dev must not affect prod. If they share the same state, one side’s work can break the other.

flowchart TB
    subgraph "Environment Separation Requirements"
        direction TB
        Sep1["Complete state separation"]
        Sep2["Allow variable/config differences"]
        Sep3["Access control separation"]
        Sep4["Differentiated approval processes"]
        Sep5["Minimize code duplication"]
    end

Terraform offers two solutions: workspaces and directory-based separation. Terragrunt exists as a complementary tool. Let’s look at all three in order.

terraform workspace

By default, Terraform operates in a workspace called default. Switching workspaces separates the state files.

# Check current workspace
terraform workspace show
# default

# Create new workspaces
terraform workspace new dev
terraform workspace new prod

# Switch workspace
terraform workspace select dev

# List
terraform workspace list
#   default
# * dev
#   prod

When using a remote backend (like S3), state file paths are automatically separated per workspace. For example, if your backend configuration is:

terraform {
  backend "s3" {
    bucket = "my-tfstate"
    key    = "app/terraform.tfstate"
    region = "ap-northeast-2"
  }
}

The actual S3 paths look like this.

s3://my-tfstate/
├── app/terraform.tfstate                    # default
└── env:/
    ├── dev/app/terraform.tfstate            # dev workspace
    └── prod/app/terraform.tfstate           # prod workspace

In code, you can reference the current workspace name via the terraform.workspace variable.

locals {
  environment = terraform.workspace

  instance_count = {
    dev     = 1
    staging = 2
    prod    = 5
  }[terraform.workspace]
}

resource "aws_instance" "app" {
  count         = local.instance_count
  instance_type = local.environment == "prod" ? "m5.large" : "t3.micro"

  tags = {
    Environment = local.environment
  }
}

At first glance, it looks clean. One set of code, just switch workspaces to get multiple environments. But in practice, this can become a trap.

Limitations of Workspaces

Even the official HashiCorp documentation recommends workspaces only for “lightweight separation for slightly different environments.” It implies they’re unsuitable for production separation. Here’s why.

1) Shared code means shared risk

You made a bad code change in the dev workspace. But this is code that prod also uses. It’s not a structure where you verify in dev and then move to prod — it’s a structure where you run the same code with different workspaces. It’s hard to add verification steps specifically for production.

2) Configuration files aren’t separated

Workspaces separate state, but share code. You use terraform.workspace to branch environment-specific settings, but as branches multiply, code becomes complex and readability drops.

resource "aws_db_instance" "db" {
  instance_class = {
    dev     = "db.t3.micro"
    staging = "db.t3.medium"
    prod    = "db.r5.xlarge"
  }[terraform.workspace]

  allocated_storage = {
    dev     = 20
    staging = 50
    prod    = 500
  }[terraform.workspace]

  backup_retention_period = {
    dev     = 1
    staging = 7
    prod    = 30
  }[terraform.workspace]

  # ... workspace branching for dozens of attributes
}

Even this much is already unpleasant to read. You could extract to variable files, but then it’s not much different from directory separation.

3) The backend itself can’t be separated

Workspaces only differ in key paths within the same backend. If you want to put the dev state in a different AWS account’s S3 bucket, workspaces can’t do that. For fully separated backends per environment, you need a different approach.

4) Error-prone

The current workspace isn’t shown in the prompt. If you ran terraform apply while actually in the prod workspace? Without automation, mistakes are easy to make.

Summary: Workspaces are suitable for short-lived experimental sandboxes within the same team or per-feature preview environments. They’re unsuitable for long-term dev/staging/prod separation.

Directory-Based Environment Separation

This is the far more common approach in practice. Environments are completely separated by directory.

infra/
├── modules/               # Reusable modules
│   ├── vpc/
│   ├── eks/
│   └── rds/
└── envs/
    ├── dev/
    │   ├── main.tf        # Module composition
    │   ├── variables.tf
    │   ├── terraform.tfvars
    │   └── backend.tf     # Dev state backend
    ├── staging/
    │   ├── main.tf
    │   ├── variables.tf
    │   ├── terraform.tfvars
    │   └── backend.tf     # Staging state backend
    └── prod/
        ├── main.tf
        ├── variables.tf
        ├── terraform.tfvars
        └── backend.tf     # Prod state backend

Each environment is an independent directory with an independent backend. Modules are shared, but per-environment settings live within each directory.

flowchart TB
    Modules["modules/\n(Reusable units)"]

    Dev["envs/dev/"]
    Staging["envs/staging/"]
    Prod["envs/prod/"]

    Modules --> Dev
    Modules --> Staging
    Modules --> Prod

    Dev -->|"terraform init/apply\n(from dev directory)"| DevState["dev state\n(dev S3 bucket)"]
    Staging -->|"terraform init/apply\n(from staging directory)"| StagingState["staging state\n(staging S3 bucket)"]
    Prod -->|"terraform init/apply\n(from prod directory)"| ProdState["prod state\n(prod S3 bucket)"]

Each environment’s main.tf composes modules like this.

# envs/prod/main.tf
module "vpc" {
  source = "../../modules/vpc"

  cidr_block  = "10.0.0.0/16"
  environment = "prod"
  azs         = ["ap-northeast-2a", "ap-northeast-2c", "ap-northeast-2d"]
}

module "eks" {
  source = "../../modules/eks"

  cluster_name = "prod-eks"
  vpc_id       = module.vpc.vpc_id
  subnet_ids   = module.vpc.private_subnet_ids

  node_groups = {
    default = {
      instance_type = "m5.large"
      min_size      = 3
      max_size      = 10
      desired_size  = 5
    }
  }
}

# envs/dev/main.tf
module "vpc" {
  source = "../../modules/vpc"

  cidr_block  = "10.10.0.0/16"  # Different CIDR from prod
  environment = "dev"
  azs         = ["ap-northeast-2a", "ap-northeast-2c"]
}

module "eks" {
  source = "../../modules/eks"

  cluster_name = "dev-eks"
  vpc_id       = module.vpc.vpc_id
  subnet_ids   = module.vpc.private_subnet_ids

  node_groups = {
    default = {
      instance_type = "t3.medium"  # Smaller
      min_size      = 1
      max_size      = 3
      desired_size  = 1
    }
  }
}

Same modules, but different parameters per environment. Work is done from within each directory.

cd envs/dev
terraform init
terraform apply

# Switch to prod
cd ../prod
terraform init
terraform apply

Advantages

Completely independent state/backend per environment
Can use different AWS accounts per environment
Easy to enforce special approval processes for prod only
Low probability of accidentally applying to the wrong environment

Disadvantages

Code duplication (each environment’s main.tf is similar)
Backend settings get hardcoded per environment
Need to run terraform init separately for each environment

Most production teams use this approach. The duplication issue is partially mitigated by modularization, but not perfectly. Terragrunt fills this gap.

A Brief Introduction to Terragrunt

Terragrunt is a Terraform wrapper made by Gruntwork. Its core philosophy is “use Terraform the DRY (Don’t Repeat Yourself) way.”

Terragrunt’s key idea is to declare repeated elements across environment directories (backend settings, providers, common variables) once and inherit them.

Here’s a typical Terragrunt project structure.

infra/
├── terragrunt.hcl              # Root config (global backend, provider)
├── _envcommon/                 # Common component definitions
│   ├── vpc.hcl
│   └── eks.hcl
└── envs/
    ├── dev/
    │   ├── env.hcl             # Dev global variables
    │   ├── vpc/
    │   │   └── terragrunt.hcl  # Inherits _envcommon/vpc.hcl
    │   └── eks/
    │       └── terragrunt.hcl
    └── prod/
        ├── env.hcl
        ├── vpc/
        │   └── terragrunt.hcl
        └── eks/
            └── terragrunt.hcl

Declare the backend once in the root terragrunt.hcl

# infra/terragrunt.hcl
remote_state {
  backend = "s3"

  config = {
    bucket         = "my-company-tfstate"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "ap-northeast-2"
    encrypt        = true
    dynamodb_table = "my-company-tflock"
  }

  generate = {
    path      = "backend.tf"
    if_exists = "overwrite"
  }
}

${path_relative_to_include()} is the key. The relative path of each subdirectory automatically becomes the key. Running from envs/dev/vpc automatically sets the key to envs/dev/vpc/terraform.tfstate. Per-environment, per-component state separation comes for free.

Environment settings declared once

# envs/dev/env.hcl
locals {
  environment = "dev"
  aws_region  = "ap-northeast-2"
  account_id  = "111122223333"
}

Actual component declaration

# envs/dev/vpc/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

include "envcommon" {
  path = "${get_terragrunt_dir()}/../../../_envcommon/vpc.hcl"
}

locals {
  env = read_terragrunt_config(find_in_parent_folders("env.hcl")).locals
}

terraform {
  source = "../../../modules/vpc"
}

inputs = {
  cidr_block  = "10.10.0.0/16"
  environment = local.env.environment
}

The production VPC’s terragrunt.hcl is nearly identical, with only the CIDR in inputs being different.

Another powerful feature Terragrunt provides is automatic dependency management.

# envs/dev/eks/terragrunt.hcl
dependency "vpc" {
  config_path = "../vpc"
}

inputs = {
  vpc_id     = dependency.vpc.outputs.vpc_id
  subnet_ids = dependency.vpc.outputs.private_subnet_ids
}

The dependency block automatically retrieves outputs from another Terragrunt project. No need to manually copy VPC outputs or fetch them via data sources.

The run-all command

# Apply all sub-projects in dependency order
terragrunt run-all apply

# A specific environment only
cd envs/dev
terragrunt run-all apply

Terragrunt calculates the DAG and auto-applies in VPC → EKS → app order.

When to adopt Terragrunt

If there are 2 or fewer environments and things are simple, pure Terraform is sufficient
If there are 3+ environments or components exceed 10, consider Terragrunt
If using a multi-account structure (different AWS account per environment), Terragrunt has a clear advantage
Factor in the learning curve for the entire team

Strategy Comparison

Here’s a summary of the three strategies.

Factor	workspace	Directory Separation	Terragrunt
State separation	Path separation within same backend	Fully independent backends	Fully independent backends
Multi-AWS account	Difficult	Possible	Possible, easy
Code duplication	Low	High (proportional to environments)	Low
Backend config	Once	Repeated per environment	Once
Learning curve	Low	Low	Medium
Dependency automation	None	None	Built-in
Large-scale suitability	Low	Medium	High

Practical recommendations

Solo or 2-3 person small team: Start with pure Terraform + directory separation
Mid-size team, 2-3 environments, 5-10 components: Directory separation + modularization
Large team or multi-account: Consider Terragrunt adoption

Whichever approach you choose, prod state must always be in a completely separate backend. Separating prod with workspaces is risky in the long term.

Combining with CI/CD

Environment separation is tightly coupled with CI/CD. A typical pipeline looks like this.

flowchart LR
    PR["PR Created"] --> Plan["terraform plan\n(all environments)"]
    Plan --> Review["Review & Approval"]
    Review --> Merge["Merge to main"]
    Merge --> Dev["dev auto-apply"]
    Dev --> Stg["staging auto-apply"]
    Stg --> Manual["prod manual approval"]
    Manual --> Prod["prod apply"]

At PR time, plan results for all environments are posted as comments. After merge, dev and staging auto-apply, while prod goes through a manual approval step. With directory-based separation, you run Terraform individually from each environment directory; with Terragrunt, run-all handles it all at once.

This pipeline is covered in detail in the next part (CI/CD integration).

Environment separation is a balancing point where “minimize code duplication” and “state independence” pull in opposite directions. Choose the right point based on team size and environment complexity. What’s certain is that prod must always have independent state.

In the next part, we’ll cover Kubernetes and Helm providers. We’ll also look at the criteria for deciding whether to manage cluster internals with Terraform or delegate to ArgoCD.

→ Part 12: Kubernetes and Helm Providers