Table of contents
- Why State Matters
- Limitations of Local State
- Choosing a Remote Backend
- S3 + DynamoDB Backend Configuration
- GCS and Azure Blob Backends
- State Locking
- Drift — When Reality and State Diverge
- terraform state Commands
- Safety Rules for Handling State
Why State Matters
When you first start using Terraform, you notice a file called terraform.tfstate appearing. If you accidentally delete it or commit it, Terraform will try to create the same resources all over again on the next run. Even though there’s already an existing AWS instance.
Why does this happen? Terraform stores “what I’ve created so far” in the state. The HCL code represents “the desired final state,” while the state is a snapshot of “what actually exists right now.” Terraform compares the two and only makes API calls for the differences. Without the state, there’s nothing to compare against.
flowchart LR
HCL["HCL Code\n(Desired state)"]
State["State file\n(What was actually created)"]
Cloud["Cloud\n(True current state)"]
HCL -->|Compared during plan| Diff["Calculate diff"]
State -->|Compared during plan| Diff
Diff -->|apply| Cloud
Cloud -.refresh.-> State
So it’s no exaggeration to say that state management is half of Terraform operations. When working alone, a local file is enough, but the moment you become a team, the story changes.
Limitations of Local State
By default, running terraform apply creates a terraform.tfstate file in the current directory. This is local state.
my-infra/
├── main.tf
├── terraform.tfstate # Current state
└── terraform.tfstate.backup # Backup of previous state
This is fine when working alone on a small project. But the moment the team grows to two or more, problems start.
- No concurrent work: If A is running apply while B simultaneously runs apply, the state gets corrupted
- Hard to share: You’d put the state in Git, but it contains sensitive information (DB passwords, API keys) in plaintext
- No backup: If your laptop breaks, the state evaporates with it
Local state should only be used for “solo prototypes” or “tutorial purposes.” The moment you move to a team, you need to migrate to a remote backend.
Choosing a Remote Backend
Terraform supports several remote backends. Pick one that matches your cloud.
| Backend | Locking Method | Typical Environment |
|---|---|---|
s3 + DynamoDB | DynamoDB table | AWS |
gcs | Built-in object locking | GCP |
azurerm | Blob lease | Azure |
remote (Terraform Cloud) | Built-in | Multi-cloud, team collaboration |
http | Depends on implementation | GitLab, self-hosted |
The most commonly used combination is AWS’s s3 + DynamoDB. Store the state file in S3 and use a DynamoDB table for locking.
sequenceDiagram
participant A as Developer A
participant B as Developer B
participant DDB as DynamoDB\n(Lock)
participant S3 as S3\n(State)
A->>DDB: Attempt to acquire lock
DDB-->>A: OK (Lock ID issued)
A->>S3: Read state
B->>DDB: Attempt to acquire lock
DDB-->>B: Already locked (wait)
A->>S3: Write state after apply
A->>DDB: Release lock
DDB-->>B: OK (now available)
B->>S3: Read state
While one person is working, the other automatically waits. This is a safety mechanism that prevents accidental overwrites.
S3 + DynamoDB Backend Configuration
For AWS environments, this combination is the de facto standard. Let’s set it up step by step.
First, you need to create the S3 bucket and DynamoDB table for state storage ahead of time. These resources are typically created manually rather than with Terraform. Creating “the space to hold state” with Terraform creates a chicken-and-egg problem.
# Create S3 bucket
aws s3api create-bucket \
--bucket my-company-tfstate \
--region ap-northeast-2 \
--create-bucket-configuration LocationConstraint=ap-northeast-2
# Enable versioning — allows recovery even if state is accidentally overwritten
aws s3api put-bucket-versioning \
--bucket my-company-tfstate \
--versioning-configuration Status=Enabled
# Set up encryption
aws s3api put-bucket-encryption \
--bucket my-company-tfstate \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}]
}'
# Create DynamoDB table (for locking)
aws dynamodb create-table \
--table-name my-company-tflock \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
Versioning is a must. If the state gets corrupted, you can roll back to a previous version. Encryption is also essential. State can contain sensitive information like DB passwords.
Now declare the backend in your Terraform code.
# main.tf
terraform {
required_version = ">= 1.6.0"
backend "s3" {
bucket = "my-company-tfstate"
key = "prod/vpc/terraform.tfstate"
region = "ap-northeast-2"
dynamodb_table = "my-company-tflock"
encrypt = true
}
}
key is the path within the bucket. Organizing by environment and service makes management easier: prod/vpc/, prod/eks/, dev/vpc/, and so on.
When setting up or changing the backend for the first time, you need to run terraform init.
terraform init
# If there's existing local state, it will ask if you want to copy it to the remote
Initializing the backend...
Do you want to copy existing state to the new backend? (yes/no)
Answer yes and the local state will be uploaded to S3. From then on, all operations work based on the S3 state.
GCS and Azure Blob Backends
If you use GCP, the gcs backend is the standard. It’s similar to S3 but doesn’t require a separate locking table like DynamoDB. GCS natively supports object-level locking.
terraform {
backend "gcs" {
bucket = "my-company-tfstate"
prefix = "prod/vpc"
}
}
Azure uses the azurerm backend. It handles locking via Blob Storage’s lease feature.
terraform {
backend "azurerm" {
resource_group_name = "tfstate-rg"
storage_account_name = "mycompanytfstate"
container_name = "tfstate"
key = "prod/vpc/terraform.tfstate"
}
}
Regardless of which backend you use, the core principles are the same: enable versioning, encrypt, activate locking, and minimize access permissions.
State Locking
Locking is automatically applied when running commands that read or write state, such as terraform apply or terraform plan. It’s automatically released when the command completes. Developers rarely need to manually intervene.
But sometimes accidents happen. If you kill the process with Ctrl+C during apply, the network disconnects, or a CI pipeline gets interrupted, the lock remains unreleased. The next command will produce an error like this.
Error: Error acquiring the state lock
Lock Info:
ID: abc123-def456
Path: my-company-tfstate/prod/vpc/terraform.tfstate
Operation: OperationTypeApply
Who: kai@laptop
Created: 2026-04-18 10:23:15 +0900 KST
In this case, first check with the team whether someone is actually working on it, and if not, manually release the lock.
terraform force-unlock abc123-def456
You need to use the ID shown in Lock Info. force-unlock is exactly what it sounds like — a forced release, so if someone was actually in the middle of an apply, the state could get corrupted. Always confirm with the team before executing.
Drift — When Reality and State Diverge
Suppose you created a resource with Terraform, and then someone changes the configuration directly in the AWS console. Now the state file’s content and the actual cloud state are misaligned. This is called drift.
How do you detect drift?
terraform plan -refresh-only
The -refresh-only flag reads the actual resource state and compares it with the state. It doesn’t apply any changes.
Terraform will perform the following actions:
# aws_security_group.web will be updated in-place
~ resource "aws_security_group" "web" {
~ ingress = [
- {
cidr_blocks = ["10.0.0.0/16"]
from_port = 22
to_port = 22
protocol = "tcp"
},
+ {
cidr_blocks = ["0.0.0.0/0"] # Someone opened it via the console
from_port = 22
to_port = 22
protocol = "tcp"
},
]
}
The example above shows a situation where someone opened the security group’s SSH port to the entire internet. This could lead to a serious security incident.
When drift is detected, there are two options.
- Update Terraform code to match the actual state: If the console change was intentional, reflect it in code
- Run
terraform applyto revert to the code state: If the change was unintentional, restore the original
Many teams set up CI pipelines that run -refresh-only periodically (daily or weekly) to detect drift. If you can’t completely block console access, you should at least detect changes.
terraform state Commands
There are times when you need to directly manipulate the state. For example, when renaming a resource, removing a resource from state, or moving it to a different state. The terraform state subcommand is used for these tasks.
flowchart TB
state["terraform state"]
state --> list["list\n(List resources in state)"]
state --> show["show\n(Resource details)"]
state --> mv["mv\n(Rename/move resource)"]
state --> rm["rm\n(Remove from state)"]
state --> pull["pull\n(Dump state as JSON)"]
state --> push["push\n(Overwrite state)"]
list — See what’s in the state
terraform state list
aws_s3_bucket.logs
aws_security_group.web
aws_instance.app[0]
aws_instance.app[1]
module.vpc.aws_vpc.main
A quick way to see which resources are being managed.
show — View detailed state for a specific resource
terraform state show aws_security_group.web
# aws_security_group.web:
resource "aws_security_group" "web" {
id = "sg-0abc123def456"
name = "web-sg"
vpc_id = "vpc-0123456789"
ingress = [...]
...
}
Useful for debugging. You can see exactly what the state looks like.
mv — Rename a resource or move it into a module
If you rename a resource in code, Terraform will try to delete the existing resource and create it again with the new name. This is absolutely not what you want. By using state mv to rename it within the state, the actual cloud resource isn’t touched, and Terraform recognizes “oh, it’s the same resource.”
# Rename resource: aws_instance.web → aws_instance.app
terraform state mv aws_instance.web aws_instance.app
# Move into a module: aws_vpc.main → module.network.aws_vpc.main
terraform state mv aws_vpc.main module.network.aws_vpc.main
A frequently used feature during refactoring.
rm — Remove from state only (actual resource stays)
Use this when you want to keep the actual cloud resource but remove it from Terraform management. Useful when transferring to manual management or migrating to a different Terraform project.
terraform state rm aws_instance.legacy
After this command, Terraform no longer knows about that resource. If you accidentally run apply after this, Terraform might try to create a new one. Always check the impact with terraform plan before using this.
import — Bring an existing resource into state
Used when bringing a resource created via the console under Terraform management.
# First write the HCL code
resource "aws_instance" "legacy" {
# (Leave attributes empty or roughly fill them in)
}
# Pass the actual resource ID
terraform import aws_instance.legacy i-0abc123def456
After import, check the diff between code and actual state with terraform plan, then refine the code to match. Starting from Terraform 1.5, you can also declare import blocks in HCL for a declarative approach.
import {
to = aws_instance.legacy
id = "i-0abc123def456"
}
resource "aws_instance" "legacy" {
# ...
}
Safety Rules for Handling State
Finally, here are the fundamental principles for not corrupting state.
- Don’t commit state files to Git: They contain sensitive information in plaintext. Add
*.tfstateand*.tfstate.backupto.gitignore - Use a remote backend with backups: S3 versioning, GCS versioning is a must
- Enable locking: Prevents concurrent work accidents
- Think twice before using
state rmorforce-unlock: These are irreversible commands - Don’t manually edit state: Manually modifying JSON pulled with
pullshould be a last resort
Managing state well is half of Terraform operations. If you’re working with a team, a remote backend is not optional — it’s mandatory.
In the next part, we’ll cover Terraform modules. We’ll look at how to bundle repeating infrastructure patterns into reusable units and how to leverage the official registry.


Loading comments...