Table of contents
- Why Infrastructure Needs Testing Too
- First Things First — fmt and validate
- Static Security Analysis — Checkov and tfsec
- Policy Validation — OPA
- Integration Testing — Terratest
- terraform test — Built-in Test Framework
- pre-commit Hooks — Catch Issues Locally First
- tflint — Linter
- Full Pipeline Configuration
- Tool Selection Guide
Why Infrastructure Needs Testing Too
We take testing application code for granted. So why don’t we test infrastructure code? It’s because the “just deploy it and see” mindset persists. But Terraform code has bugs, security gaps, and policy violations too.
- A security group opens SSH to 0.0.0.0/0
- An S3 bucket is set to public read
- An IAM policy allows
*:* - Tags are missing
These issues should be caught before they’re applied. Discovering them after deployment is already too late.
flowchart LR
Code["HCL Code"] --> L1["Stage 1:\nfmt / validate"]
L1 --> L2["Stage 2:\nStatic security analysis\n(Checkov, tfsec)"]
L2 --> L3["Stage 3:\nPolicy validation\n(OPA, Sentinel)"]
L3 --> L4["Stage 4:\nIntegration tests\n(Terratest)"]
L4 --> Apply["apply stage"]
These stages differ in cost and speed. The further left, the faster and cheaper but narrower in scope; the further right, the slower and more expensive but verifying actual behavior.
First Things First — fmt and validate
The fastest checks Terraform provides out of the box. Run them always, both locally and in CI.
terraform fmt — Code style formatting
# Auto-format all files
terraform fmt -recursive
# In CI, only "check if there are unformatted files"
terraform fmt -check -recursive
A tool that ensures the entire team uses the same style. It automatically aligns indentation, spacing, and attribute ordering. -check returns a non-zero exit code if any files aren’t formatted, failing CI.
terraform validate — Syntax validation
terraform validate
Checks whether HCL syntax is correct, whether referenced variables and resources exist, and whether types match. Must be run after terraform init so the provider schema is available for validation.
Success! The configuration is valid.
Error: Reference to undeclared input variable
on main.tf line 5, in resource "aws_instance" "app":
5: instance_type = var.size
It doesn’t catch everything — typos and logic errors slip through. But it’s the fastest and easiest first gate.
Static Security Analysis — Checkov and tfsec
Tools that parse HCL and check “whether this code violates security rules.” They can run without applying and without state, making them great for continuous use during development.
Checkov
Made by Bridgecrew (Palo Alto). Supports not just Terraform but also CloudFormation, Kubernetes, Dockerfile, and various other formats.
# Install
pip install checkov
# Run
checkov -d .
checkov -d envs/prod --framework terraform
Results look like this.
Check: CKV_AWS_24: "Ensure no security groups allow ingress from 0.0.0.0:0 to port 22"
FAILED for resource: aws_security_group.web
File: /envs/prod/main.tf:45-60
45 | resource "aws_security_group" "web" {
...
55 | ingress {
56 | from_port = 22
57 | to_port = 22
58 | protocol = "tcp"
59 | cidr_blocks = ["0.0.0.0/0"]
60 | }
Guide: https://docs.bridgecrew.io/docs/networking_1
It tells you which rule was violated, which file and line number, and even provides a guide link.
When you need to skip a specific rule, leave a comment in the file.
# checkov:skip=CKV_AWS_24:Internal network, allowed
resource "aws_security_group" "internal" {
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
}
}
Always leave a reason in the skip comment — that’s the principle. Overusing skip is the same as “checking nothing.”
tfsec
Made by Aqua Security. Being Terraform-specific makes it a bit lighter.
# Install (macOS)
brew install tfsec
# Run
tfsec .
Result #1 CRITICAL Security group rule allows ingress from public internet.
──────────────────────────────────────────
envs/prod/main.tf:55
──────────────────────────────────────────
53 ingress {
54 from_port = 22
55 → cidr_blocks = ["0.0.0.0/0"]
56 protocol = "tcp"
57 to_port = 22
──────────────────────────────────────────
ID AVD-AWS-0107
Impact Your port is exposed to the whole internet
Resolution Set a more restrictive cidr range
Checkov and tfsec overlap considerably. Some teams use both, some use just one. Start light with one and add more if needed. Personally, tfsec feels cleaner in output since it’s Terraform-specialized.
Policy Validation — OPA
Checkov and tfsec check predefined rules. Organization-specific policies (e.g., “all resources must have an Owner tag”) need to be expressed as custom policies. This is where OPA (Open Policy Agent) comes in.
OPA is a general-purpose policy engine that uses a language called Rego. It can be used with Kubernetes, Envoy, Terraform, and more. For Terraform, it’s commonly used through a wrapper called conftest.
First, get the JSON from terraform plan.
terraform plan -out=tfplan
terraform show -json tfplan > plan.json
Write the policy in Rego.
# policies/tags.rego
package main
required_tags := {"Owner", "Environment", "Service"}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
tag := required_tags[_]
not resource.change.after.tags[tag]
msg := sprintf("Instance %v is missing required tag %v", [resource.address, tag])
}
This policy means “all EC2 instances must have Owner, Environment, and Service tags.” Matching deny indicates a policy violation.
# Check
conftest test --policy policies/ plan.json
FAIL - plan.json - Instance aws_instance.app is missing required tag Environment
More complex examples are possible too, like “only certain instance types are allowed in production.”
package main
allowed_prod_types := {"m5.large", "m5.xlarge", "r5.large"}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
resource.change.after.tags.Environment == "prod"
instance_type := resource.change.after.instance_type
not allowed_prod_types[instance_type]
msg := sprintf(
"Instance type %v is not allowed in production. Allowed: %v",
[instance_type, allowed_prod_types]
)
}
Codifying organizational policies lets anyone discover violations proactively. Questions like “What were the production rules again?” disappear.
Integration Testing — Terratest
The heaviest test. It actually creates cloud resources and verifies they work as intended. Written in Go.
// test/vpc_test.go
package test
import (
"fmt"
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestVpcModule(t *testing.T) {
t.Parallel()
terraformOptions := &terraform.Options{
TerraformDir: "../modules/vpc",
Vars: map[string]interface{}{
"cidr_block": "10.99.0.0/16",
"environment": fmt.Sprintf("test-%d", time.Now().Unix()),
},
}
// Always clean up when test ends
defer terraform.Destroy(t, terraformOptions)
// Run apply
terraform.InitAndApply(t, terraformOptions)
// Verify outputs
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
assert.Regexp(t, "^vpc-", vpcId)
subnets := terraform.OutputList(t, terraformOptions, "public_subnet_ids")
assert.Equal(t, 2, len(subnets))
}
This test actually creates a VPC and subnets, verifies outputs match expectations, and cleanly deletes everything at the end.
cd test
go test -v -timeout 30m
The trade-offs are extreme.
Advantages
- Validates actual behavior — catches things static analysis misses
- Excellent for regression prevention in complex modules
Disadvantages
- Each test takes minutes to tens of minutes
- Incurs real cloud costs
- Risk of leftover resources if tests fail mid-way
That’s why Terratest is applied only to frequently used core modules. Looking at how many Terratest suites official open-source modules maintain gives you a good benchmark. Applying it to all code has a poor cost-to-benefit ratio.
terraform test — Built-in Test Framework
Starting from Terraform 1.6, there’s a native test framework. You can write tests in HCL alone, without Go.
# tests/vpc.tftest.hcl
run "valid_cidr" {
command = plan
variables {
cidr_block = "10.99.0.0/16"
environment = "test"
}
assert {
condition = aws_vpc.this.cidr_block == "10.99.0.0/16"
error_message = "VPC CIDR must match input value"
}
}
run "creates_two_subnets_by_default" {
command = plan
variables {
cidr_block = "10.99.0.0/16"
environment = "test"
}
assert {
condition = length(aws_subnet.public) == 2
error_message = "Should create 2 subnets for the default 2 AZs"
}
}
Run like this.
terraform test
With command = plan, it validates using plan only without apply. No actual resources are created, so it’s fast and free. Changing to command = apply creates real resources for testing (costs incur).
This built-in framework is ideal for simple validations like “do outputs change as expected with inputs” and “does conditional logic behave correctly.” Complex integration tests still favor Terratest, but everyday module testing is well-served by the built-in framework.
pre-commit Hooks — Catch Issues Locally First
Running basic checks at commit time locally, before CI, gives much faster feedback. Use the pre-commit tool.
# .pre-commit-config.yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.88.4
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: terraform_tflint
args:
- --args=--only=terraform_required_version
- --args=--only=terraform_required_providers
- id: terraform_tfsec
- id: terraform_docs
args:
- --args=--output-file README.md
Installation and activation.
# Install (macOS)
brew install pre-commit
# Activate hooks
pre-commit install
# Manual run (all files)
pre-commit run --all-files
Now every git commit automatically runs fmt, validate, tflint, and tfsec. If there’s an issue, the commit fails. Catch problems on the developer’s laptop before they reach CI.
flowchart LR
Edit["Edit code"] --> Commit["git commit"]
Commit --> Hook["pre-commit hook"]
Hook --> Check{"Checks pass?"}
Check -->|"Fail"| Fix["Fix and re-commit"]
Check -->|"Pass"| Push["git push"]
Push --> CI["CI Pipeline\n(heavier tests)"]
tflint — Linter
tflint is a Terraform-specific linter. It provides far more rules than official checks.
brew install tflint
tflint --init
tflint
You can enable AWS provider rulesets via .tflint.hcl configuration.
plugin "aws" {
enabled = true
version = "0.30.0"
source = "github.com/terraform-linters/tflint-ruleset-aws"
}
rule "terraform_unused_declarations" {
enabled = true
}
rule "terraform_deprecated_interpolation" {
enabled = true
}
tflint catches invalid instance types, nonexistent AMI IDs, unused variables, and more.
Full Pipeline Configuration
Here’s an example integrating all the tools covered so far into CI.
# .github/workflows/terraform-ci.yml
name: Terraform CI
on:
pull_request:
paths: ['**/*.tf', '**/*.tftest.hcl']
jobs:
checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.8.0
- name: Format
run: terraform fmt -check -recursive
- name: Init (all envs/*)
run: |
for dir in envs/*; do
(cd "$dir" && terraform init -backend=false)
done
- name: Validate
run: |
for dir in envs/*; do
(cd "$dir" && terraform validate)
done
- name: tflint
uses: terraform-linters/setup-tflint@v4
- run: |
tflint --init
tflint --recursive
- name: tfsec
uses: aquasecurity/tfsec-action@v1.0.3
- name: Checkov
uses: bridgecrewio/checkov-action@master
with:
directory: .
framework: terraform
soft_fail: false
- name: OPA Policy Check
run: |
for dir in envs/*; do
(cd "$dir" && terraform plan -out=tfplan && terraform show -json tfplan > plan.json)
conftest test --policy policies/ "$dir/plan.json"
done
- name: terraform test
run: |
for mod in modules/*; do
if [ -d "$mod/tests" ]; then
(cd "$mod" && terraform test)
fi
done
Each stage has different costs. fmt and validate are fastest (seconds), static analysis next (tens of seconds), OPA next (minutes, including plan), and actual Terratest is heaviest (tens of minutes). Ordering the pipeline this way ensures fast failure and saves overall time.
Tool Selection Guide
Many tools have been covered. What should you use?
| Purpose | Essential | Recommended | Optional |
|---|---|---|---|
| Syntax/style | terraform fmt, validate | tflint | |
| Static security analysis | tfsec or Checkov | Both | |
| Organization policy | OPA/conftest | Sentinel (Terraform Cloud) | |
| Unit testing | terraform test | ||
| Integration testing | Terratest (core modules only) | ||
| Local integration | pre-commit |
You don’t need to adopt everything at once. Build up incrementally.
- Start with
fmt,validate, andpre-commitsetup - Add
tfsecorCheckovto CI - Add
terraform testfor frequently changing core modules - Introduce OPA when organizational policies become clear
- Terratest only for truly critical modules
Infrastructure has quality standards too. “If it works, it’s fine” might pass once, but it won’t hold up at scale. Automated testing and policy validation form the foundation of trust in infrastructure code.
In the next part, we wrap up the series by compiling practical patterns and pitfalls. Directory structures, tagging strategies, common incidents, and large-scale migration — all in one sweep.


Loading comments...