Skip to main content

Terraform

Rationale

Terraform is used for writing our entire infrastructure stack as code.

The main reasons why we chose it over other alternatives are:

  1. It is Open source.
  2. It is Widely used by the community.
  3. It Uses HCL, a very easy to learn structured configuration language.
  4. It is not platform-bounded.
  5. It has a stateless approach to infrastructure. There are no master machines, agents, or incremental infrastructure. Instead, infrastructure is regenerated from scratch every time it is required.
  6. Due to its stateless approach, parity between development and production environments is assured.
  7. It has hundreds of open source providers that give it full flexibility across many platforms.
  8. It has thousands of open source modules that simplify writing infrastructure and avoiding repetition.
  9. Deploying infrastructure usually takes no longer than a few minutes.

Alternatives

The following alternatives were considered but not chosen for the following reasons:

  1. Ansible: Deployments were too slow.
  2. AWS CDK: It is platform-bounded.
  3. AWS Cloudformation: It is platform-bounded.
  4. Chef: It has a stateful approach to infrastructure, including a master machine, agents and mutable infrastructure.
  5. Pulumi: It is not as widely used, resulting in less providers, modules and overall community support.
  6. Puppet: It has a stateful approach to infrastructure, including a master machine, agents and mutable infrastructure.
  7. SaltStack: It has a stateful approach to infrastructure, including a master machine, agents and mutable infrastructure.

Usage

Used for every infrastructure piece like databases, DNS records, firewall rules, computing clusters, among others. Some examples are:

  1. GitLab Runners.
  2. DNS.
  3. Kubernetes.
  4. Okta.
  5. Website.

We do not use Terraform in:

  1. AWS Redshift: Pending to implement.
  2. GitLab: Pending to implement.
  3. GitLab Runner Bastion: Pending to implement.
  4. Google Workspace: Pending to implement.

Guidelines

  1. Test an infrastructure module with ./m <product>.<module>.test
  2. Deploy an infrastructure module with ./m <product>.<module>.apply

Terraform state lock

The Terraform state file stores local information regarding our infrastructure configuration, which is used to determine the necessary changes required to be made in the real world (terraform apply). This state file is shared amongst team members to ensure consistency; however, if it is not properly locked, it can lead to data loss, conflicts, and state file corruption.

In case of conflicts with the state file, please follow the steps below:

  1. Obtain the state lock id from the failed job
  2. Access the terraform_state_lock table in DynamoDB by going to AWS - production in Okta (requires prod_integrates role)
  3. Search for the ID in the Info attribute and delete the .tfstate item
  4. Attempt to rerun the job that failed.