Table of contents
Why CI/CD Is Necessary
Running Terraform from your laptop alone is fast and convenient. But the moment you become a team, problems pile up.
- No audit trail of who applied what and when: Makes root cause analysis difficult when incidents occur
- Local credentials required: Every team member needs production AWS keys
- No approval process: Ad-hoc changes to production become easy
- State lock conflicts: Concurrent work by multiple people gets tangled
A CI/CD pipeline addresses all of these issues at once. Every change goes through a PR, plan results are visible to reviewers, and apply only happens after approval. Credentials exist only on the CI server, not on individual laptops.
flowchart LR
Dev["Developer"] -->|"Create PR"| PR["Pull Request"]
PR -->|"Auto-triggered"| Plan["terraform plan"]
Plan -->|"Result as comment"| Review["Code Review"]
Review -->|"Approve & merge"| Merge["main branch"]
Merge -->|"Auto or manual trigger"| Apply["terraform apply"]
Apply --> Infra["Cloud Changes"]
In this part, we’ll look at two representative approaches: GitHub Actions and Atlantis.
GitHub Actions Basic Pipeline
The most commonly used approach. Place a workflow YAML in the GitHub repo, and it automatically runs Terraform in response to PRs and pushes.
Let’s start with the basic directory structure.
infra/
├── .github/
│ └── workflows/
│ └── terraform.yml
└── envs/
├── dev/
└── prod/
A simple workflow example.
# .github/workflows/terraform.yml
name: Terraform
on:
pull_request:
paths:
- 'envs/**'
- 'modules/**'
push:
branches: [main]
paths:
- 'envs/**'
- 'modules/**'
permissions:
contents: read
pull-requests: write
id-token: write # For OIDC
jobs:
plan:
name: Plan (${{ matrix.env }})
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
strategy:
fail-fast: false
matrix:
env: [dev, prod]
defaults:
run:
working-directory: envs/${{ matrix.env }}
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::111122223333:role/github-actions-tf-${{ matrix.env }}
aws-region: ap-northeast-2
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.8.0
- name: Format Check
run: terraform fmt -check -recursive
- name: Init
run: terraform init
- name: Validate
run: terraform validate
- name: Plan
id: plan
run: terraform plan -no-color -out=tfplan
continue-on-error: true
- name: Comment Plan on PR
uses: actions/github-script@v7
env:
PLAN: ${{ steps.plan.outputs.stdout }}
with:
script: |
const output = `#### Terraform Plan: \`${{ matrix.env }}\` 📖
\`\`\`
${process.env.PLAN}
\`\`\`
`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
});
- name: Plan Status
if: steps.plan.outcome == 'failure'
run: exit 1
apply:
name: Apply (${{ matrix.env }})
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
needs: [plan]
strategy:
matrix:
env: [dev, prod]
environment:
name: ${{ matrix.env }} # prod uses manual approval gate
defaults:
run:
working-directory: envs/${{ matrix.env }}
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::111122223333:role/github-actions-tf-${{ matrix.env }}
aws-region: ap-northeast-2
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.8.0
- name: Init
run: terraform init
- name: Apply
run: terraform apply -auto-approve
Here’s the flow summarized. When a PR is opened, plan runs for all environments and results are posted as comments on the PR. After merge, apply runs from main. Prod requires an approval gate via GitHub Environments.
sequenceDiagram
participant Dev as Developer
participant GH as GitHub
participant CI as GitHub Actions
participant AWS as AWS
Dev->>GH: Create PR
GH->>CI: pull_request event
CI->>AWS: OIDC authentication
CI->>AWS: terraform plan
AWS-->>CI: Plan result
CI->>GH: Post plan as PR comment
Dev->>GH: Review & approve, merge
GH->>CI: push event
CI->>CI: Auto-run dev apply
CI->>GH: Wait for prod environment approval
Dev->>GH: Manual approval
GH->>CI: Resume apply
CI->>AWS: terraform apply
Removing Credentials with OIDC
In the example above, aws-actions/configure-aws-credentials authenticates with AWS via OIDC without access keys. This is the standard approach today.
GitHub Actions assumes an IAM Role. You register a GitHub OIDC provider in AWS once, then create an IAM Role that trusts that provider. The Role’s trust policy only allows specific repos and specific branches.
# GitHub OIDC provider
resource "aws_iam_openid_connect_provider" "github" {
url = "https://token.actions.githubusercontent.com"
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [
"6938fd4d98bab03faadb97b34396831e3780aea1",
"1c58a3a8518e8759bf075b76b750d4f2df264fcd",
]
}
# IAM Role trust policy
data "aws_iam_policy_document" "github_actions_assume" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
principals {
type = "Federated"
identifiers = [aws_iam_openid_connect_provider.github.arn]
}
condition {
test = "StringEquals"
variable = "token.actions.githubusercontent.com:aud"
values = ["sts.amazonaws.com"]
}
condition {
test = "StringLike"
variable = "token.actions.githubusercontent.com:sub"
values = ["repo:my-org/infra-repo:ref:refs/heads/main"]
}
}
}
resource "aws_iam_role" "github_actions_tf" {
name = "github-actions-tf-prod"
assume_role_policy = data.aws_iam_policy_document.github_actions_assume.json
}
With this setup, there’s no need to store AWS access keys in GitHub Actions. No key rotation worries, and the risk of leakage is low.
Showing Plan Results as PR Comments
The Comment Plan on PR step in the workflow above handles this. Reviewers can immediately see “what this change will actually do” when opening a PR.
#### Terraform Plan: `prod` 📖
Terraform will perform the following actions:
aws_security_group_rule.allow_https will be created
- resource “aws_security_group_rule” “allow_https” {
- from_port = 443
- to_port = 443
- protocol = “tcp”
- cidr_blocks = [“0.0.0.0/0”]
- security_group_id = “sg-0abc123”
- type = “ingress” }
Plan: 1 to add, 0 to change, 0 to destroy.
This dramatically improves code review quality. The question “What does this PR actually create?” disappears. Especially when a destroy appears, reviewers can be immediately alert.
When plan output is too long, it hits GitHub’s comment size limit (65536 characters). A common approach is to show only a summary or collapse with <details> tags.
const planSummary = `${process.env.PLAN}`.slice(0, 50000);
const output = `#### Plan: \`${{ matrix.env }}\`
<details><summary>View details</summary>
\`\`\`diff
${planSummary}
\`\`\`
</details>`;
Secret Management
Sensitive information should not enter Terraform code or state. The secrets that CI needs to handle are mainly two kinds.
1) Cloud credentials
OIDC can eliminate these (see above). If you absolutely must use access keys, store them in GitHub Secrets with per-environment access controls.
2) Sensitive Terraform variables
DB passwords, external API keys, etc. How to pass them in CI:
- name: Apply
env:
TF_VAR_db_password: ${{ secrets.DB_PASSWORD }}
TF_VAR_slack_webhook: ${{ secrets.SLACK_WEBHOOK }}
run: terraform apply -auto-approve
TF_VAR_<variable_name> environment variables are automatically injected as Terraform input variables. No need to write them in .tfvars files as plaintext.
However, values passed this way are stored as plaintext in the state file. Protecting state is handled by backend-side encryption (S3 server-side encryption).
To go further, you can keep secrets outside Terraform entirely. Store them in AWS Secrets Manager or HashiCorp Vault, and Terraform only references them.
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "prod/rds/password"
}
resource "aws_db_instance" "db" {
password = data.aws_secretsmanager_secret_version.db_password.secret_string
}
It still gets recorded in state, but when rotation is needed, you just change it in Secrets Manager and re-apply.
Atlantis — PR-Based Terraform Automation
GitHub Actions is sufficient, but there’s a more sophisticated Terraform-specific automation tool: Atlantis.
Atlantis is a tool that operates Terraform via PR comments. Leave a comment saying atlantis plan on a PR, and Atlantis runs plan and automatically posts the result on the PR. atlantis apply triggers apply.
flowchart LR
PR["PR Created"] --> Auto["Atlantis auto-plan"]
Auto --> Comment["Plan comment on PR"]
Comment --> Human["Reviewer checks"]
Human -->|"atlantis apply\ncomment"| Apply["Atlantis runs apply"]
Apply --> Lock["PR locked\n(auto-merge)"]
Atlantis’s advantages:
- Per-PR state locking: If two PRs touch the same directory, one waits until the other finishes
- Validates plan is current before apply: If someone else applied and the state changed after plan, requires re-verification
- Fine-grained permission control: Only specific teams can apply to specific directories
- Multi-repo/multi-workspace support
The basic configuration file is atlantis.yaml at the repo root.
version: 3
automerge: false
projects:
- name: dev
dir: envs/dev
autoplan:
when_modified:
- "*.tf"
- "../../modules/**/*.tf"
enabled: true
apply_requirements:
- approved
- mergeable
- name: prod
dir: envs/prod
autoplan:
when_modified:
- "*.tf"
- "../../modules/**/*.tf"
enabled: true
apply_requirements:
- approved
- mergeable
workflow: prod-workflow
workflows:
prod-workflow:
plan:
steps:
- init
- plan
apply:
steps:
- run: echo "Production apply — exercise extreme caution"
- apply
Setting approved in apply_requirements means apply is only possible after PR approval. mergeable requires the PR to be in a mergeable state (no conflicts and CI passing).
Atlantis can be deployed on Kubernetes, ECS, or even a single EC2 with Docker. It needs a public endpoint that can receive webhooks.
GitHub Actions vs Atlantis
When should you use which?
| Factor | GitHub Actions | Atlantis |
|---|---|---|
| Setup/operational overhead | None (provided by GitHub) | Self-operated server |
| Terraform-specific features | Build it yourself | Built-in |
| State lock management | Handle yourself | Automatic |
| Multi-directory coordination | Manual setup | Auto-detected |
| Learning curve | Low | Medium |
| Cost | GitHub Actions usage fees | Server operation costs |
When GitHub Actions fits
- Other CI/CD pipelines already on GitHub Actions besides Terraform
- Infrastructure directories are simple and change frequency is low
- Don’t want to operate a separate server
When Atlantis fits
- Infrastructure repo is large with many concurrent PRs
- Need Terraform-specific features (plan freshness validation, fine-grained permissions)
- A dedicated DevOps team can operate specialized tools
If starting small, GitHub Actions is sufficient. As the team grows and concurrent work increases, consider Atlantis adoption.
A Few Practical Tips
1) Abandon the idea of bundling plan and apply in the same workflow run
“Auto-apply after merge” is easy. But “having a human approve between plan and apply” doesn’t work directly in GitHub Actions. Work around it with Environment approval gates, or use Atlantis.
2) Save plan to a file and use that file for apply
terraform plan -out=tfplan
terraform apply tfplan
Saving plan to a file with -out makes apply fail if the state changed since plan time. This eliminates “the possibility that the plan I reviewed differs from the actual apply.” Note that tfplan files can contain sensitive information, so be careful when managing them as artifacts.
3) Set timeouts on CI runs
jobs:
apply:
timeout-minutes: 30
Prevents apply from hanging indefinitely if something goes wrong.
4) Run a separate drift detection job
Running terraform plan -refresh-only periodically (daily) across all environments and alerting to Slack makes for a good workflow.
on:
schedule:
- cron: '0 9 * * *' # Daily at 9 AM
jobs:
drift-check:
# ... plan -refresh-only -detailed-exitcode
Quickly catches when someone secretly changes settings via the console.
5) Automate version tagging for module repos
Having a workflow that automatically applies SemVer tags when PRs are merged lets module users pin to specific versions reliably. Tools like release-please or semantic-release help.
CI/CD is the pillar that supports Terraform operational reliability. Once the cycle of PR → plan → review → merge → apply is established, it becomes transparent who changed what and when, and the possibility of incidents drops significantly. Whether big or small, any team should build this pipeline before moving forward.
In the next part, we’ll cover testing and policy validation. We’ll look at how to ensure Terraform code quality and security with Terratest, Checkov, tfsec, and OPA.


Loading comments...