Skip to main content

Redshift

Rationale#

We use Redshift as a data warehouse for all our analytics processes.

The main reasons why we chose it over other alternatives are:

  1. It is designed for online analytic processing (OLAP), which grants complete flexibility for executing complex queries against large datasets. This requirement is a must in order to be able to answer all kinds of business-related questions based on our data.
  2. It complies with several certifications from ISO and CSA. Many of these certifications are focused on granting that the entity follows best practices regarding secure cloud-based environments and information security.
  3. It supports clustering, allowing to distribute data accross nodes, granting horizontal autoscaling capabilities.
  4. Its princing model is infrastrucutre-based, meaning that you pay for the size and number of nodes your cluster has. Such approach makes it very cheap when compared to other SaaS data warehouses.
  5. It creates incremental snapshots of your data every eight hours, allowing you to revert to a previous state in case the need arises.
  6. Although Redshift is not Open source, it is supported by PostgreSQL, allowing us to locally simulate a Redshift-like databases for testing.
  7. It is supported by ChartIO, our analytics visualization tool.
  8. It can be partially managed (tables not supported) as code using Terraform.
  9. It supports encryption at rest using KMS.
  10. It fully integrates with IAM, allowing to keep a least privilege approach regarding authentication and authorization.
  11. It supports VPC security groups, allowing to specify networking inbound and outbound rules for IP addresses, ports and other security groups.
  12. Cluster nodes performance can be monitored via CloudWatch.

Alternatives#

  1. AWS Athena: It is a SaaS database, meaning that no infrastructure maintainance is required. Its pricing model is based on the number of TBs of data scaned by each query, which makes it considerably more expensive in the long term. Pending to review.
  2. Google BigQuery: Pending to review.
  3. Snowflake: Pending to review.

Usage#

  1. We use Redshift for storing data from many of our services and then visualizing it using ChartIO.
  2. Our Redshift architecture is not documented. Pending to implement.
  3. Our Redshift cluster is not written as code using Terraform. Pending to implement.
  4. Our Redshift is not encrypted at rest using KMS. Pending to implement.

Guidelines#

You can access the Redshift console after authenticating on AWS.