Elastic Block Store (EBS)

Rationale#

AWS EBS is the service we use for Block-level storage. It allows us to have hard drives in the cloud.

The main reasons why we chose it over other alternatives are:

  1. It seamlessly integrates with AWS EC2, allowing to connect external hard drives to instances.
  2. It provides a wide range of disk types that goes from SSDs with a size of 64 TiB and a throughput of 4000 MiB/s to HHDs with a size of 16 TiB and a throughput of 500 MiB/s.
  3. Disks are also divided into different specializations. There are General purpose and Provisioned IOPS SSDs and Throughput Optimized and Cold HHDs. By having all these different types of disks, we can easily select which ones to work with depending on the nature of the problem we are trying to solve.
  4. It supports point-in-time snapshots designed to backing up all the data that exists within a disk.
  5. Disks can be easily attached and detached from AWS EC2 machines, allowing to easily change general machine configurations without losing any data.
  6. Disks can be encrypted using AWS KMS keys, allowing to encrypt data moving between the disk and the instance that is using it, data at rest inside the volume, disk snapshots, and all volumes created from those snapshots.
  7. It supports data lifecyle policies, allowing to create, retain and delete disks based on created policies.
  8. It supports monitoring and metrics using AWS CloudWatch.

Alternatives#

  1. Google Compute Engine: It did not exist at the time we migrated to the cloud. GCP does not offer an equivalent to EBS. Instead, their entire disks service exists within GCE. It does not support disk encryption.
  2. Azure Disk Storage: It did not exist at the time we migrated to the cloud. Pending to review.

Usage#

We use AWS EBS for:

  1. Gitlab CI bastion: We use a 16 GiB GP2 disk, as it only needs having basic software installed like Gitlab Runner and Docker Machine. High disk throughput is not required.
  2. Gitlab CI workers: We use 10 GiB GP3 disks just for hosting our workers' Operating system. Additionally, workers come with high throughput 50 GiB internal NVMe disks, which are very useful for achieving as-fast-as-possible job performance within our CI.
  3. Batch processing workers: Just like with our CI workers, we use 8 GiB GP2 disks just for hosting the Operating system. These workers also come with 50 GiB internal NVMe disks.
  4. Kubernetes cluster workers: We use 50 GiB GP2 disks for hosting the base Operating system and stored containers for applications like our ASM. High disk thoughput is not required as our ASM does not store any data within local disks.
  5. Okta RADIUS Agent: We use a 50 GiB GP2 disk. It is probably oversized as only the base Operating system and RADIUS agent are required. High disk throughput is not required.
  6. ERP: We use two disks, a 50 GiB GP2 disk for hosting the base Operating system and a 200 GiB GP2 disk for hosting the ERP data.

Guidelines#

  1. You can access the AWS EBS console after authenticating on AWS.
  2. Any changes to EBS's infrastructure must be done via Merge Requests.
  3. To learn how to test and apply infrastructure via Terraform, visit the Terraform Guidelines.