Skip to content
ioob.dev
Go back

Terraform Part 7 — Data Sources and Import

· 8 min read
Terraform Series (7/15)
  1. Terraform Part 1 — What Is Terraform
  2. Terraform Part 2 — Installation and First Deploy
  3. Terraform Part 3 — HCL Syntax
  4. Terraform Part 4 — Variables and Outputs
  5. Terraform Part 5 — Providers
  6. Terraform Part 6 — Resources and Dependencies
  7. Terraform Part 7 — Data Sources and Import
  8. Terraform Part 8 — State Management
  9. Terraform Part 9 — Modules
  10. Terraform Part 10 — Loops and Conditionals
  11. Terraform Part 11 — Workspaces and Environment Separation
  12. Terraform Part 12 — Kubernetes and Helm Providers
  13. Terraform Part 13 — CI/CD Integration
  14. Terraform Part 14 — Testing and Policy
  15. Terraform Part 15 — Practical Patterns and Pitfalls
Table of contents

Table of contents

The World Already Exists Outside Terraform

The series so far has flowed from the premise of “building infrastructure from scratch with Terraform on a blank cloud.” But reality is different. When you join a company, there’s already a VPC someone created via the console, an S3 bucket someone manually set up that’s been running for three years, and the networking lives in a separate Terraform project managed by another team. Even when building a new system, it needs to connect to those existing assets.

This part covers two tools.

The difference lies in “ownership.” A data source says “I didn’t create it and I don’t need to delete it, just tell me the values,” while import says “I’m going to manage it from now on.”

flowchart LR
    subgraph EXIST[Already Existing Resources]
        R[VPC created via console<br/>Managed by another team]
    end

    subgraph ME[My Terraform]
        D[data block<br/>Read only]
        I[terraform import<br/>Bring under management]
    end

    R -->|Query| D
    R -->|Incorporate| I
    D -->|Use attributes| USE1[My resources]
    I -->|Managed by Terraform going forward| TF[Recorded in tfstate]

data Blocks — Read-Only Resources

data looks similar to resource on the surface. It takes two labels (type and name) and receives conditions as arguments. The difference is that Terraform does not create, modify, or delete anything. During terraform apply, it simply calls the AWS API to find a resource matching the given conditions and retrieves its attributes.

AMI Lookup — The Most Common Pattern

This is a pattern we briefly saw in Part 2.

data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t3.micro"
}

Instead of hardcoding the AMI ID like "ami-0c55b159cbfafe1f0", it finds “the latest al2023 published by Amazon” at runtime. When a new AMI is released, it automatically follows. The reference syntax is similar to resources, but note the data. prefix.

Default VPC/Subnet Lookup

Each region has a default VPC that AWS creates automatically. For simple experiments, you might want to use it as-is.

data "aws_vpc" "default" {
  default = true
}

data "aws_subnets" "default" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.default.id]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t3.micro"
  subnet_id     = tolist(data.aws_subnets.default.ids)[0]
}

aws_subnets (plural) returns a list, so tolist(...)[0] extracts the first subnet. This way you can place resources on top of the default network.

Referencing Resources Managed by Other Teams

This is a more important use case in practice. Suppose the network team follows a structure where “we manage VPCs with Terraform, and each team deploys services on top of them.” The service team needs the VPC ID but doesn’t manage it.

There are several approaches.

# 1. Look up by tags
data "aws_vpc" "platform" {
  tags = {
    Name = "platform-main"
    Env  = "prod"
  }
}

# 2. Receive the ID as a variable
variable "vpc_id" {
  type = string
}

data "aws_vpc" "platform" {
  id = var.vpc_id
}

# 3. Read outputs from another Terraform State (remote state)
data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "myorg-terraform-state"
    key    = "network/prod/terraform.tfstate"
    region = "ap-northeast-2"
  }
}

# Usage
resource "aws_subnet" "my_service" {
  vpc_id     = data.terraform_remote_state.network.outputs.vpc_id
  cidr_block = "10.0.100.0/24"
}

Approach 3 (terraform_remote_state) reads values that the network team exposed as outputs directly from our project. It’s clean but creates tight coupling (if the output structure of the remote state changes, our code breaks too). As organizations grow, it’s better for maintainability to either create a dedicated data source for network information lookup (e.g., store values in aws_ssm_parameter) or use tag-based lookups for loose coupling.

Account Information, Region, Availability Zones

Frequently used metadata can also be queried via data sources.

data "aws_caller_identity" "current" {}

data "aws_region" "current" {}

data "aws_availability_zones" "available" {
  state = "available"
}

locals {
  account_id = data.aws_caller_identity.current.account_id
  region     = data.aws_region.current.name
  azs        = data.aws_availability_zones.available.names
}

resource "aws_s3_bucket" "logs" {
  # A unique name that includes account ID and region
  bucket = "myapp-logs-${local.account_id}-${local.region}"
}

This prevents the mistake of hardcoding account IDs and makes it easier to reuse the same code across multiple regions.

Summary of Differences Between data and resource

flowchart TB
    subgraph RES[resource]
        RC[Create]
        RU[Update]
        RD[Delete]
        RS[Read]
    end

    subgraph DATA[data]
        DS2[Read only]
    end

    APPLY[terraform apply] --> RC
    APPLY --> RU
    APPLY --> RD
    APPLY --> RS
    APPLY --> DS2

    DESTROY[terraform destroy] -.->|Affected| RD
    DESTROY -.->|Not affected| DS2

Data blocks re-query the actual cloud’s current values on every apply. They are not cached. Because of this behavior, network calls can increase, so if dozens of data blocks are redundantly querying the same resource, consider refactoring to query once and store the result in locals.

terraform import — Incorporating Existing Resources

Now let’s go in the opposite direction. Bringing “resources Terraform doesn’t know about” — like S3 buckets or RDS instances created via the console — under Terraform management.

The purpose of import is clear. Register existing resources in the State without recreating them, so Terraform manages them going forward. Deleting and recreating a production DB would be disastrous, so this approach is essential.

The Old Way (CLI Command)

Up through Terraform 1.4, the process used CLI commands. This classic approach still works today.

# 1. Declare an empty resource in .tf file
cat > imported.tf <<EOF
resource "aws_s3_bucket" "legacy_logs" {
  bucket = "legacy-logs-bucket"
}
EOF

# 2. Import into State
terraform import aws_s3_bucket.legacy_logs legacy-logs-bucket

# 3. Check diff between current configuration and actual state with plan
terraform plan

The import command format is terraform import <resource_address> <resource_ID>. The resource ID format varies by resource type (EC2 is i-xxx, S3 is the bucket name, RDS is the identifier). Check the “Import” section at the bottom of each resource’s official documentation.

The problem is that this approach only modifies the State without touching the code. After importing, running terraform plan produces a massive diff saying “my .tf file only has the bucket name, but the actual bucket has versioning, encryption, CORS settings, and everything else.” Transferring all of this to code one by one was incredibly tedious.

Declarative import Blocks (Terraform 1.5+)

So starting from 1.5, import blocks were added. In the declarative style, you write “I’m importing this resource to this address” in your code.

# imports.tf
import {
  to = aws_s3_bucket.legacy_logs
  id = "legacy-logs-bucket"
}

# Resource block is still required
resource "aws_s3_bucket" "legacy_logs" {
  bucket = "legacy-logs-bucket"
  # ... remaining configuration
}

Running terraform plan in this state shows Terraform’s plan to “import this resource, and here’s the diff between the current .tf and the actual state” all at once. Running apply handles the import automatically.

Even better is the code generation option.

terraform plan -generate-config-out=imported.tf

Terraform inspects the actual resource’s attributes and auto-generates .tf code. You read the generated code and keep only the necessary parts while cleaning up. The tedious manual transcription is greatly reduced.

flowchart LR
    subgraph OLD[Old Approach]
        C1[Write empty resource] --> C2[terraform import]
        C2 --> C3[Check diff with plan]
        C3 --> C4[Modify code to match reality]
    end

    subgraph NEW[1.5+ import Block]
        N1[Write import block] --> N2["terraform plan<br/>-generate-config-out=..."]
        N2 --> N3[Clean up auto-generated<br/>code]
        N3 --> N4[terraform apply]
    end

For new projects, use the 1.5+ approach. The legacy import documentation still works, but declarative import is much more Terraform-native.

Things to Watch Out for When Importing

Since import only fills in the State, there are a few pitfalls.

  1. Carefully review the diff between code and reality. If a plan shows dozens of attribute changes at once, it’s hard to predict what will be recreated. If you see -/+, that means an unintended recreation. Fix the code to match reality first, then apply
  2. Tags often mismatch. If you’ve set default_tags on the Provider, existing resources won’t have those, so everything shows up as “add tag” changes. These are harmless but make the plan messy
  3. Set prevent_destroy on sensitive resources first. If you accidentally run destroy right after import, it’s catastrophic. When importing a production DB, start by adding lifecycle { prevent_destroy = true } to the resource
  4. For resources inside modules, get the address right. Use the format module.foo.aws_instance.bar

moved — Changing Resource Addresses During Refactoring

Another tool closely related to data sources and import is the moved block. Added in Terraform 1.1. Use it when you want to “change only the internal address without recreating the resource.”

This is a common situation in practice — moving a resource into a module during refactoring.

# Old structure
resource "aws_s3_bucket" "logs" {
  bucket = "myapp-logs"
}

# New structure — moved to a module
module "logs" {
  source = "./modules/bucket"
  name   = "myapp-logs"
}

If you make this change, Terraform will try to delete aws_s3_bucket.logs and create module.logs.aws_s3_bucket.this as new. The actual bucket gets deleted and recreated. In a production environment, this is an unacceptable change.

The solution is the moved block.

moved {
  from = aws_s3_bucket.logs
  to   = module.logs.aws_s3_bucket.this
}

With this declaration, Terraform recognizes it as “just changing the address in the State internally.” It doesn’t touch the actual resource. Refactoring becomes safe.

Previously, this was done with CLI commands like terraform state mv, which was imperative and non-reproducible. moved blocks are declarative and tracked in Git, so team members can see the history.

Practical Flow for Moving Console Resources to Terraform

Combining the tools from this part, let’s say you get a task at work: “We have an RDS created via the console, and now we want to manage it with Terraform.” Here’s what the practical steps look like.

sequenceDiagram
    participant Dev as Developer
    participant TF as Terraform
    participant AWS as AWS

    Dev->>AWS: Investigate current RDS configuration (console, CLI)
    Dev->>TF: Write import block + resource
    Note over TF: lifecycle { prevent_destroy = true }
    Dev->>TF: terraform plan -generate-config-out
    TF-->>Dev: Auto-generated code
    Dev->>TF: Clean up & review code
    Dev->>TF: terraform plan (final)
    TF-->>Dev: No changes / only safe changes
    Dev->>TF: terraform apply
    TF->>AWS: Register RDS in State
    Note over Dev,AWS: Terraform manages it from here on

In words, the steps are as follows.

  1. Investigate: Identify the actual RDS engine, version, storage, VPC, security groups, and parameter groups
  2. Write: Create a corresponding aws_db_instance in .tf, add an import block, and include prevent_destroy up front
  3. Auto-generate: Use terraform plan -generate-config-out=... to generate code from the current state
  4. Clean up: Remove attributes from the generated code that Terraform can’t manage (like status, which AWS determines), and add attributes you don’t want to manage to ignore_changes
  5. Verify: Confirm that terraform plan shows “No changes” or only acceptable changes
  6. Apply: From this point on, Terraform takes ownership

It looks complex on the surface, but the 1.5+ approach is far less painful than before. If your company has a lot of console legacy, this process is worth doing as a team effort.

When to Use Querying vs Incorporation

Let’s wrap up with the selection criteria.

When these three tools are combined, you can gradually gain Terraform’s benefits even for infrastructure that wasn’t created from scratch. This is especially valuable in environments where legacy and new code coexist.

Moving Past the Basics

Over seven parts, we’ve covered Terraform’s fundamentals. From IaC concepts through installation, HCL syntax, variables, providers, resources, and connecting with existing infrastructure. If you’ve made it this far, reading, modifying, and adding features to typical AWS/GCP infrastructure project code shouldn’t be difficult.

Starting from Part 8, we dive into practical operational topics. We’ll cover State management, modules, iteration patterns, environment separation, CI/CD integration, and testing and policies — tackling Terraform at scale.


In the next part, we’ll cover how to store Terraform State remotely and share it safely across a team.

Part 8: State Management


Related Posts

Share this post on:

Comments

Loading comments...


Previous Post
Terraform Part 6 — Resources and Dependencies
Next Post
Terraform Part 8 — State Management