Terraform
Rationale
Terraform is used for writing our entire infrastructure stack as code.
The main reasons why we chose it over other alternatives are:
- It is Open source.
- It is Widely used by the community.
- It Uses HCL, a very easy to learn structured configuration language.
- It is not platform-bounded.
- It has a stateless approach to infrastructure. There are no master machines, agents, or incremental infrastructure. Instead, infrastructure is regenerated from scratch every time it is required.
- Due to its stateless approach, parity between development and production environments is assured.
- It has hundreds of open source providers that give it full flexibility across many platforms.
- It has thousands of open source modules that simplify writing infrastructure and avoiding repetition.
- Deploying infrastructure usually takes no longer than a few minutes.
Alternatives
The following alternatives were considered but not chosen for the following reasons:
- Ansible: Deployments were too slow.
- AWS CDK: It is platform-bounded.
- AWS Cloudformation: It is platform-bounded.
- Chef: It has a stateful approach to infrastructure, including a master machine, agents and mutable infrastructure.
- Pulumi: It is not as widely used, resulting in less providers, modules and overall community support.
- Puppet: It has a stateful approach to infrastructure, including a master machine, agents and mutable infrastructure.
- SaltStack: It has a stateful approach to infrastructure, including a master machine, agents and mutable infrastructure.
Usage
Used for every infrastructure piece like databases, DNS records, firewall rules, computing clusters, among others. Some examples are:
We do not use Terraform in:
- AWS Redshift: Pending to implement.
- Gitlab: Pending to implement.
- Gitlab Runner Bastion: Pending to implement.
- Google Workspace: Pending to implement.
Guidelines
- Test an infrastructure module with
./m <product>.<module>.test
- Deploy an infrastructure module with
./m <product>.<module>.apply
Terraform state lock
The Terraform state file
stores local information
regarding our infrastructure configuration,
which is used to determine
the necessary changes required to be made in the real world (terraform apply).
This state file is shared amongst team members to ensure consistency;
however, if it is not properly locked,
it can lead to data loss, conflicts, and state file corruption.
In case of conflicts with the state file, please follow the steps below:
- Obtain the state lock id from the failed job
- Access the
terraform_state_lock
table in DynamoDB by going to AWS - production in Okta (requires prod_integrates role) - Search for the ID in the Info attribute and delete the
.tfstate
item - Attempt to rerun the job that failed.