TypeScript logo but with a P
Patrick Spafford Software Engineer

What is Infrastructure as Code?


October 14, 2023

7 min read

views

A tall building with many windows

Definition

Infrastucture as Code (IaC) is one of many approaches to managing virtual infrastructure. It means writing a description of the desired state of your infrastructure in code for an IaC tool to consume. When your code runs, the IaC tool talks to the APIs of the infrastructure you defined to calculate the difference between the current state and the desired state. The IaC tool then makes calls to those APIs to give you the virtual infrastructure exactly as you described.

A couple of common Infrastructure as Code tools that you can use are Terraform and Pulumi. In the case of Terraform, the code that you are writing is a proprietary language called Hashicorp Configuration Language (HCL). In the case of Pulumi, you write in any supported general purpose programming language such as TypeScript or Python.

Managing Virtual Infrastructure

To see why managing your virtual infrastructure this way is helpful, let’s imagine you need to create 3 virtual machines in your favorite cloud.

Manual

You log into the console of your cloud provider and use the UI to create your virtual machines. When you want to change them in the future, you figure out how to do it with the UI.

A person on a laptop using a user interface

Advantages ✅

  • No code is required.
  • It’s good for simple, static infrastructure.
  • It’s hard to create more infrastructure than you need by mistake.

Disadvantages ❌

  • It’s tedious, particularly when creating/changing/deleting any of the following:
    • many of the same resource
    • many types of resources
    • resources across multiple accounts
  • It sometimes requires learning the ins and outs of the cloud provider’s console.
  • It’s hard to assess what infrastructure you currently have, especially if you have more than 1 type of thing created.
  • There’s no tracking of changes to your infra.
  • You need to be cautious when changes to resources need to be made in a certain order.
  • The infrastructure is not easily reproducible.
  • Your infrastructure may slowly drift into weird states after many changes.

Programmatic

You write a Python script that might look something like this:

#!/usr/bin/env python
import requests
import argparse

def create_virtual_machines(n: int):
    # ...

if __name__ == "__main__":
    try:
        parser = argparse.ArgumentParser()
        parser.add_argument('--machines', '-m', type=int)
        args = parser.parse_args()
        machines = args.machines
        create_virtual_machines(machines)
        print(f"Successfully created {machines} virtual machines")
    except as e:
        print(e)
        print("Failed to create all virtual machines.")

You might invoke your simple Python CLI like this:

./create_virtual_machines --machines 3

Let’s unpack the pros and cons of this approach.

Advantages ✅

  • It’s faster to manage resources if you find yourself doing the same operation repeatedly.
  • Assuming the script is under source control, it’s easier to infer what arguments were passed to the cloud provider’s API.
  • There’s no need to navigate the console / UI.
  • It’s easier to control the order of operations when changing your infra.
  • In theory, you could share the script with someone else and they could replicate the infrastructure you have.

Disadvantages ❌

  • It’s still hard to assess the current state of your infra.
  • There’s a high operational burden to develop a new function every time a non-routine change is needed.
  • It’s way too easy to create more resources than you need.
  • You may need to trudge through the API docs of the cloud provider.
  • Writing tests further compounds the operational burden.
  • Your Python script may throw an error in the middle of execution, leaving the state of your infrastructure in limbo.

Infrastructure as Code

The construction of a skyscraper

With IaC, here’s what you might do.

You choose Terraform as your IaC tool. After getting set up with Terraform, you write declarative HCL (Hashicorp Configuration Language) code that mirrors the infrastructure you want to have. You don’t specify “the how” of your virtual infrastructure. The concept of executing your program statement-by-statement or line-by-line is largely thrown out. Instead, your concern is to write a program that represents the final state of the infrastructure that you want to have after all is said and done (i.e., after all API calls are made to your cloud provider). In the example code below, it’s perfectly possible for the third VM to be created before the first one. They don’t have to be created in any particular order since no VM depends on another for its creation. Luckily, we don’t care because all we want is our infrastructure to be created as described.

After running terraform plan, we can see what the code would yield without having to actually create the resources. If the plan matches our expectations, then we can do what’s called an apply. The apply action is a command specific to Terraform. By running terraform apply, Terraform compares the current state of your infrastructure to what you have written in code and makes your actual infrastructure “match” the infrastructure you have represented in code. It does this by making API calls to the cloud provider. In our case, there is only 1 cloud provider, but we could use any number of cloud providers, if we wanted.

Let’s return to our specific scenario.

If we have no infrastructure and we run terraform apply, it will create 3 VMs. If the first attempt at creation spontaneously failed for 1 of our VMs, we could safely run terraform apply again without fear of having more than 3 VMs running. Further, if we manually deleted one of these VMs, we would see that the next time we ran terraform apply, it would try to re-create the missing VM. If we wanted to delete a VM with code, all we would need to do is comment out one of the resource blocks below and then rerun terraform apply.

// infrastructure.tf
resource "cloud_provider_vm" "my_vm_0" {
    image  = "ubuntu-18-04-x64"
    name   = "my-vm-0"
    region = "nyc1"
    size   = "s-1vcpu-1gb"
}

resource "cloud_provider_vm" "my_vm_1" {
    image  = "ubuntu-18-04-x64"
    name   = "my-vm-1"
    region = "nyc2"
    size   = "s-1vcpu-1gb"
}

resource "cloud_provider_vm" "my_vm_2" {
    image  = "ubuntu-18-04-x64"
    name   = "my-vm-2"
    region = "nyc3"
    size   = "s-1vcpu-1gb"
}

Advantages ✅

  • It’s easier to see what infrastructure you have and how it is configured.
  • With functions like terraform plan, it’s possible to know what infrastructure changes will be made before you make them.
  • You can version control the code.
  • It’s harder to create more infrastructure than you need.
  • You don’t have to learn the user interface of the cloud provider.
  • If something fails unexpectedly, you can rerun your code (or, depending on the error, change your code) to get your infrastructure to the desired final state.
  • You can create many resources in many places in parallel.
  • When there are dependencies (resource A needs to be created before resource B), you can control exactly the order in which infrastructure is created or destroyed. See this Terraform feature.
  • Infrastructure is more easily replicable and transferable because you can share your exact infrastructure configuration with someone else.
  • You can use many cloud providers at the same time without jumping from provider UI to provider UI.
  • You can write tests of your infrastructure code. However, your mileage may vary depending on the IaC tool that you select.

Disadvantages ❌

  • There is some overhead in learning the IaC tool itself.
  • If you aren’t the only one managing the infrastructure, it may be tricky to set up permissions. For instance, you might want a colleague to be able to make changes to certain resources without giving them the “keys to the kingdom.”
  • You will need to browse provider-specific documentation. In the case of Terraform, you have to browse the Terraform Registry for APIs that have Terraform providers.
  • If you have your own API that you would like to be compatible with IaC tools, you’ll need to build a provider.

Wrapping Up

Infrastructure as Code is a powerful way to manage your virtual infrastructure. However, it is not a silver bullet and whether you should use it depends on your use case.

To check your knowledge on the contents of this article, please feel free to take the quiz below.

Quiz

Glossary

Virtual infrastructure: any “thing” that you can create by talking to an API (e.g., a virtual machine, a DNS record, a database, a tweet)

Terraform provider: an adapter around a vendor-specific API that allows Terraform to read, create, update, and delete resources that the vendor API provides. Typically written in Go.