Photo by T.J. Breshears / Unsplash
Photo by T.J. Breshears / Unsplash

How to Extend Terraform Scope With Data Sources

Push back the interconnection frontiers with data sources

Guillaume Vincent
Guillaume Vincent

Table of Contents

Terraform is one of the tools to have in your toolbox. It allows you to manage your infrastructure as code and supports many cloud and service providers like AWS, GCP, and plenty of others.

If you do not find your happiness in the providers you can develop your own. For that, Hashicorp provides an SDK but you will have to put your hand in Golang code. But, it may take some time to develop what you need and it is not always necessary.

To retrieve states from other elements of your infrastructure such as a REST API, you can simply use the data sources.

What Are Data Sources?

Data sources allow data to be fetched and computed for use in your code. The source can be located in another Terraform configuration or outside. Contrarily to resources, the data sources are not managed by Terraform.

They are read-only presented views of pre-existing data or they can compute new values on the fly within Terraform. The code snippet below shows an example of data source usage:

# Find the latest available AMI that is tagged with Component = web
data "aws_ami" "web" {
  filter {
    name   = "state"
    values = ["available"]
  }

  filter {
    name   = "tag:Component"
    values = ["web"]
  }

  most_recent = true
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.web.id
  instance_type = "t1.micro"
}

The data source aws_ami retrieves an AWS AMI (Amazon Machine Image) corresponding to the defined filters. The AMI should have an available state and the Component tag equal to web. If they are multiple AMIs returned by the data source, the latest is returned due to the most_recent parameter.

Once the data source computer, the attribute id is reused in the aws_instance.web resource. This is one of the many existing data sources available. You can find them in the documentation of your provider.

In a project, data sources are very useful to :

  • Reduce the coupling between your modules and use your infrastructure as a source of truth.
  • Hide the complexity to the Terraform end user by reducing the number of variables

With data sources, you can even get information from your own custom applications. We will use JSONPlaceHolder to use a fake API to simulate this use case:

$ curl https://jsonplaceholder.typicode.com/todos/1
{
  "userId": 1,
  "id": 1,
  "title": "delectus aut autem",
  "completed": false
}

Communicate With a REST API With Data Sources

The HTTP data source

The HTTP data source makes an HTTP request to a given URL. It exports information about the response :

variable "todo_id" {
  type    = number
  default = 1
}

data "http" "this" {
  url             = "https://jsonplaceholder.typicode.com/todos/${var.todo_id}"
  request_headers = {
    Accept        = "application/json"
  }
}

output "todo" {
  value = data.http.this.body
}

The Terraform code returns the body of the HTTP response in the output:

$ terraform init
$ terraform apply
data.http.this: Refreshing state... [id=https://jsonplaceholder.typicode.com/todos/1]
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
todo = {
  "userId": 1,
  "id": 1,
  "title": "delectus aut autem",
  "completed": false
}

The limitation of this data source is that it does not support authentication or any other protocol than HTTP. Another solution is possible for this.

The external data source

“The external data source allows an external program implementing a specific protocol (defined below) to act as a data source, exposing arbitrary data for use elsewhere in the Terraform configuration.” From Terraform official documentation

The data source needs to have the path of the program. The external program accepts a JSON object from stdin stored in the query parameter. It can return a JSON object as output with all values as strings. The expected return code for a successful program result is zero :

data "external" "example" {
  program = ["python", "${path.module}/example-data-source.py"]

  query = {
    # arbitrary map from strings to strings, passed
    # to the external program as the data query.
    id = "abc123"
  }
}

This data source needs the path to the program it should call. It accepts a JSON object in stdin. This is passed via the values passed in the query field. The result of the script execution is also a JSON object. The expected script return code is 0.

It is up to you to take these specifications into account when developing your script.  We will see how to implement all this!

Terraform code

variable "todo_id" {
  type = number
}

data "external" "todo" {
  
  program = ["python", "${path.module}/fetch_todo.py"]
  query = {
    id = var.todo_id
  }
}

locals {
  todo = data.external.todo.result
}

output "todo" {
  value = local.todo
}

External script

#!/usr/bin/env python3
# coding: utf-8

import sys
import json
import requests

def fetch():
    # The program needs to read the passed data in query from stdin
    input_json = sys.stdin.read()
    try:
        # The string data passed by query has json format
        input_dict = json.loads(input_json)
        todo_id = input_dict.get('id')
        # Retrieve TODO data with specified id in query from JSONPlaceholder API
        response = requests.get(f'https://jsonplaceholder.typicode.com/todos/{todo_id}')
        output_json = response.json()
        # The output is a json string with all key's values as string type
        output = json.dumps({str(key): str(value) for key, value in output_json.items()})
        # The output must be returned in stdout
        sys.stdout.write(output)
    except ValueError as e:
        sys.exit(e)

if __name__ == "__main__":
    fetch()

terraform-external-data package provides a function decorator that does the same as the previous example:

#!/usr/bin/env python3

from terraform_external_data import terraform_external_data
import requests
import json

@terraform_external_data
def fetch(query):
    # Terraform requires the values you return be strings,
    # so terraform_external_data will error if they aren't.
    todo_id = query['id']
    response = requests.get(f'https://jsonplaceholder.typicode.com/todos/{todo_id}')
    output_json = response.json()
    return {str(key): str(value) for key, value in output_json.items()}


if __name__ == '__main__':
    fetch()

The script is testable using this command line:

$ echo '{"id": 1}' | python fetch_todo.py
{"userId": "1", "id": "1", "title": "delectus aut autem", "completed": "False"}

Conclusion

We have seen what data sources are and the benefits of using them in Terraform. They offer a way to interconnect with resources external to your project. Many data sources are present in the providers.

If you need to communicate with APIs or resources that do not exist you can use http and external data sources. This will save you the time of developing your provider!

Infrastructure as Code

Guillaume Vincent Twitter

DevOps Engineer & AWS Certified Solution Architect. Cloud enthusiast and automation addict