Skip to content
ioob.dev
Go back

Terraform Part 8 — State Management

· 7 min read
Terraform Series (8/15)
  1. Terraform Part 1 — What Is Terraform
  2. Terraform Part 2 — Installation and First Deploy
  3. Terraform Part 3 — HCL Syntax
  4. Terraform Part 4 — Variables and Outputs
  5. Terraform Part 5 — Providers
  6. Terraform Part 6 — Resources and Dependencies
  7. Terraform Part 7 — Data Sources and Import
  8. Terraform Part 8 — State Management
  9. Terraform Part 9 — Modules
  10. Terraform Part 10 — Loops and Conditionals
  11. Terraform Part 11 — Workspaces and Environment Separation
  12. Terraform Part 12 — Kubernetes and Helm Providers
  13. Terraform Part 13 — CI/CD Integration
  14. Terraform Part 14 — Testing and Policy
  15. Terraform Part 15 — Practical Patterns and Pitfalls
Table of contents

Table of contents

Why State Matters

When you first start using Terraform, you notice a file called terraform.tfstate appearing. If you accidentally delete it or commit it, Terraform will try to create the same resources all over again on the next run. Even though there’s already an existing AWS instance.

Why does this happen? Terraform stores “what I’ve created so far” in the state. The HCL code represents “the desired final state,” while the state is a snapshot of “what actually exists right now.” Terraform compares the two and only makes API calls for the differences. Without the state, there’s nothing to compare against.

flowchart LR
    HCL["HCL Code\n(Desired state)"]
    State["State file\n(What was actually created)"]
    Cloud["Cloud\n(True current state)"]

    HCL -->|Compared during plan| Diff["Calculate diff"]
    State -->|Compared during plan| Diff
    Diff -->|apply| Cloud
    Cloud -.refresh.-> State

So it’s no exaggeration to say that state management is half of Terraform operations. When working alone, a local file is enough, but the moment you become a team, the story changes.

Limitations of Local State

By default, running terraform apply creates a terraform.tfstate file in the current directory. This is local state.

my-infra/
├── main.tf
├── terraform.tfstate       # Current state
└── terraform.tfstate.backup # Backup of previous state

This is fine when working alone on a small project. But the moment the team grows to two or more, problems start.

Local state should only be used for “solo prototypes” or “tutorial purposes.” The moment you move to a team, you need to migrate to a remote backend.

Choosing a Remote Backend

Terraform supports several remote backends. Pick one that matches your cloud.

BackendLocking MethodTypical Environment
s3 + DynamoDBDynamoDB tableAWS
gcsBuilt-in object lockingGCP
azurermBlob leaseAzure
remote (Terraform Cloud)Built-inMulti-cloud, team collaboration
httpDepends on implementationGitLab, self-hosted

The most commonly used combination is AWS’s s3 + DynamoDB. Store the state file in S3 and use a DynamoDB table for locking.

sequenceDiagram
    participant A as Developer A
    participant B as Developer B
    participant DDB as DynamoDB\n(Lock)
    participant S3 as S3\n(State)

    A->>DDB: Attempt to acquire lock
    DDB-->>A: OK (Lock ID issued)
    A->>S3: Read state
    B->>DDB: Attempt to acquire lock
    DDB-->>B: Already locked (wait)
    A->>S3: Write state after apply
    A->>DDB: Release lock
    DDB-->>B: OK (now available)
    B->>S3: Read state

While one person is working, the other automatically waits. This is a safety mechanism that prevents accidental overwrites.

S3 + DynamoDB Backend Configuration

For AWS environments, this combination is the de facto standard. Let’s set it up step by step.

First, you need to create the S3 bucket and DynamoDB table for state storage ahead of time. These resources are typically created manually rather than with Terraform. Creating “the space to hold state” with Terraform creates a chicken-and-egg problem.

# Create S3 bucket
aws s3api create-bucket \
  --bucket my-company-tfstate \
  --region ap-northeast-2 \
  --create-bucket-configuration LocationConstraint=ap-northeast-2

# Enable versioning — allows recovery even if state is accidentally overwritten
aws s3api put-bucket-versioning \
  --bucket my-company-tfstate \
  --versioning-configuration Status=Enabled

# Set up encryption
aws s3api put-bucket-encryption \
  --bucket my-company-tfstate \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "AES256"
      }
    }]
  }'

# Create DynamoDB table (for locking)
aws dynamodb create-table \
  --table-name my-company-tflock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

Versioning is a must. If the state gets corrupted, you can roll back to a previous version. Encryption is also essential. State can contain sensitive information like DB passwords.

Now declare the backend in your Terraform code.

# main.tf
terraform {
  required_version = ">= 1.6.0"

  backend "s3" {
    bucket         = "my-company-tfstate"
    key            = "prod/vpc/terraform.tfstate"
    region         = "ap-northeast-2"
    dynamodb_table = "my-company-tflock"
    encrypt        = true
  }
}

key is the path within the bucket. Organizing by environment and service makes management easier: prod/vpc/, prod/eks/, dev/vpc/, and so on.

When setting up or changing the backend for the first time, you need to run terraform init.

terraform init

# If there's existing local state, it will ask if you want to copy it to the remote
Initializing the backend...
Do you want to copy existing state to the new backend? (yes/no)

Answer yes and the local state will be uploaded to S3. From then on, all operations work based on the S3 state.

GCS and Azure Blob Backends

If you use GCP, the gcs backend is the standard. It’s similar to S3 but doesn’t require a separate locking table like DynamoDB. GCS natively supports object-level locking.

terraform {
  backend "gcs" {
    bucket = "my-company-tfstate"
    prefix = "prod/vpc"
  }
}

Azure uses the azurerm backend. It handles locking via Blob Storage’s lease feature.

terraform {
  backend "azurerm" {
    resource_group_name  = "tfstate-rg"
    storage_account_name = "mycompanytfstate"
    container_name       = "tfstate"
    key                  = "prod/vpc/terraform.tfstate"
  }
}

Regardless of which backend you use, the core principles are the same: enable versioning, encrypt, activate locking, and minimize access permissions.

State Locking

Locking is automatically applied when running commands that read or write state, such as terraform apply or terraform plan. It’s automatically released when the command completes. Developers rarely need to manually intervene.

But sometimes accidents happen. If you kill the process with Ctrl+C during apply, the network disconnects, or a CI pipeline gets interrupted, the lock remains unreleased. The next command will produce an error like this.

Error: Error acquiring the state lock

Lock Info:
  ID:        abc123-def456
  Path:      my-company-tfstate/prod/vpc/terraform.tfstate
  Operation: OperationTypeApply
  Who:       kai@laptop
  Created:   2026-04-18 10:23:15 +0900 KST

In this case, first check with the team whether someone is actually working on it, and if not, manually release the lock.

terraform force-unlock abc123-def456

You need to use the ID shown in Lock Info. force-unlock is exactly what it sounds like — a forced release, so if someone was actually in the middle of an apply, the state could get corrupted. Always confirm with the team before executing.

Drift — When Reality and State Diverge

Suppose you created a resource with Terraform, and then someone changes the configuration directly in the AWS console. Now the state file’s content and the actual cloud state are misaligned. This is called drift.

How do you detect drift?

terraform plan -refresh-only

The -refresh-only flag reads the actual resource state and compares it with the state. It doesn’t apply any changes.

Terraform will perform the following actions:

  # aws_security_group.web will be updated in-place
  ~ resource "aws_security_group" "web" {
      ~ ingress = [
          - {
              cidr_blocks = ["10.0.0.0/16"]
              from_port   = 22
              to_port     = 22
              protocol    = "tcp"
            },
          + {
              cidr_blocks = ["0.0.0.0/0"]  # Someone opened it via the console
              from_port   = 22
              to_port     = 22
              protocol    = "tcp"
            },
        ]
    }

The example above shows a situation where someone opened the security group’s SSH port to the entire internet. This could lead to a serious security incident.

When drift is detected, there are two options.

  1. Update Terraform code to match the actual state: If the console change was intentional, reflect it in code
  2. Run terraform apply to revert to the code state: If the change was unintentional, restore the original

Many teams set up CI pipelines that run -refresh-only periodically (daily or weekly) to detect drift. If you can’t completely block console access, you should at least detect changes.

terraform state Commands

There are times when you need to directly manipulate the state. For example, when renaming a resource, removing a resource from state, or moving it to a different state. The terraform state subcommand is used for these tasks.

flowchart TB
    state["terraform state"]
    state --> list["list\n(List resources in state)"]
    state --> show["show\n(Resource details)"]
    state --> mv["mv\n(Rename/move resource)"]
    state --> rm["rm\n(Remove from state)"]
    state --> pull["pull\n(Dump state as JSON)"]
    state --> push["push\n(Overwrite state)"]

list — See what’s in the state

terraform state list
aws_s3_bucket.logs
aws_security_group.web
aws_instance.app[0]
aws_instance.app[1]
module.vpc.aws_vpc.main

A quick way to see which resources are being managed.

show — View detailed state for a specific resource

terraform state show aws_security_group.web
# aws_security_group.web:
resource "aws_security_group" "web" {
    id          = "sg-0abc123def456"
    name        = "web-sg"
    vpc_id      = "vpc-0123456789"
    ingress     = [...]
    ...
}

Useful for debugging. You can see exactly what the state looks like.

mv — Rename a resource or move it into a module

If you rename a resource in code, Terraform will try to delete the existing resource and create it again with the new name. This is absolutely not what you want. By using state mv to rename it within the state, the actual cloud resource isn’t touched, and Terraform recognizes “oh, it’s the same resource.”

# Rename resource: aws_instance.web → aws_instance.app
terraform state mv aws_instance.web aws_instance.app

# Move into a module: aws_vpc.main → module.network.aws_vpc.main
terraform state mv aws_vpc.main module.network.aws_vpc.main

A frequently used feature during refactoring.

rm — Remove from state only (actual resource stays)

Use this when you want to keep the actual cloud resource but remove it from Terraform management. Useful when transferring to manual management or migrating to a different Terraform project.

terraform state rm aws_instance.legacy

After this command, Terraform no longer knows about that resource. If you accidentally run apply after this, Terraform might try to create a new one. Always check the impact with terraform plan before using this.

import — Bring an existing resource into state

Used when bringing a resource created via the console under Terraform management.

# First write the HCL code
resource "aws_instance" "legacy" {
  # (Leave attributes empty or roughly fill them in)
}

# Pass the actual resource ID
terraform import aws_instance.legacy i-0abc123def456

After import, check the diff between code and actual state with terraform plan, then refine the code to match. Starting from Terraform 1.5, you can also declare import blocks in HCL for a declarative approach.

import {
  to = aws_instance.legacy
  id = "i-0abc123def456"
}

resource "aws_instance" "legacy" {
  # ...
}

Safety Rules for Handling State

Finally, here are the fundamental principles for not corrupting state.

Managing state well is half of Terraform operations. If you’re working with a team, a remote backend is not optional — it’s mandatory.


In the next part, we’ll cover Terraform modules. We’ll look at how to bundle repeating infrastructure patterns into reusable units and how to leverage the official registry.

Part 9: Modules


Related Posts

Share this post on:

Comments

Loading comments...


Previous Post
Terraform Part 7 — Data Sources and Import
Next Post
Terraform Part 9 — Modules