Skip to main content

Redshift

Rationale

We use Redshift as a data warehouse for all our analytics processes.

The main reasons why we chose it over other alternatives are:

  1. It is designed for online analytic processing (OLAP), which grants complete flexibility for executing complex queries against large datasets. This requirement is a must in order to be able to answer all kinds of business-related questions based on our data.
  2. It complies with several certifications from ISO and CSA. Many of these certifications are focused on granting that the entity follows best practices regarding secure cloud-based environments and information security.
  3. It supports clustering, allowing to distribute data across nodes, granting horizontal autoscaling capabilities.
  4. Its pricing model is infrastructure-based, meaning that you pay for the size and number of nodes your cluster has. Such approach makes it very cheap when compared to other SaaS data warehouses.
  5. It creates incremental snapshots of your data every eight hours, allowing you to revert to a previous state in case the need arises.
  6. Although Redshift is not Open source, it is supported by PostgreSQL, allowing us to locally simulate a Redshift-like databases for testing.
  7. It is supported by ChartIO, our analytics visualization tool.
  8. It can be partially managed (tables not supported) as code using Terraform.
  9. It supports encryption at rest using KMS.
  10. It fully integrates with IAM, allowing to keep a least privilege approach regarding authentication and authorization.
  11. It supports VPC security groups, allowing to specify networking inbound and outbound rules for IP addresses, ports and other security groups.
  12. Cluster nodes performance can be monitored via CloudWatch.

Alternatives

  1. AWS Athena: It is a SaaS database, meaning that no infrastructure maintenance is required. Its pricing model is based on the number of TBs of data scanned by each query, which makes it considerably more expensive in the long term. Pending to review.
  2. Google BigQuery: Pending to review.
  3. Snowflake: Very similar to redshift; it offers no infrastructure maintenance and high scalability. The pricing model is pay-by-use increasing the costs when the database is at high pressure (queried very often).

Usage

  1. We use Redshift for storing data from many of our services and then visualizing it using Grow.
  2. Our Redshift architecture is not documented. Pending to implement.
  3. Our Redshift cluster is written as code using Terraform.
  4. Our Redshift is encrypted at rest using KMS.

Guidelines

You can access the Redshift console after authenticating on AWS.