Redshift
Rationale
We use Redshift as a data warehouse for all our analytics processes.
The main reasons why we chose it over other alternatives are:
- It is designed for online analytic processing (OLAP), which grants complete flexibility for executing complex queries against large datasets. This requirement is a must in order to be able to answer all kinds of business-related questions based on our data.
- It complies with several certifications from ISO and CSA. Many of these certifications are focused on granting that the entity follows best practices regarding secure cloud-based environments and information security.
- It supports clustering, allowing to distribute data across nodes, granting horizontal autoscaling capabilities.
- Its pricing model is infrastructure-based, meaning that you pay for the size and number of nodes your cluster has. Such approach makes it very cheap when compared to other SaaS data warehouses.
- It creates incremental snapshots of your data every eight hours, allowing you to revert to a previous state in case the need arises.
- Although Redshift is not Open source, it is supported by PostgreSQL, allowing us to locally simulate a Redshift-like databases for testing.
- It is supported by ChartIO, our analytics visualization tool.
- It can be partially managed (tables not supported) as code using Terraform.
- It supports encryption at rest using KMS.
- It fully integrates with IAM, allowing to keep a least privilege approach regarding authentication and authorization.
- It supports VPC security groups, allowing to specify networking inbound and outbound rules for IP addresses, ports and other security groups.
- Cluster nodes performance can be monitored via CloudWatch.
Alternatives
- AWS Athena:
It is a SaaS
database, meaning that no infrastructure
maintenance is required.
Its pricing model
is based on the
number of TBs of data scanned by each query
, which makes it considerably more expensive in the long term. Pending to review. - Google BigQuery: Pending to review.
- Snowflake: Very similar to redshift; it offers no infrastructure maintenance and high scalability. The pricing model is pay-by-use increasing the costs when the database is at high pressure (queried very often).
Usage
- We use Redshift for storing data from many of our services and then visualizing it using Grow.
- Our Redshift architecture is not documented. Pending to implement.
- Our Redshift cluster is written as code using Terraform.
- Our Redshift is encrypted at rest using KMS.
Guidelines
You can access the Redshift console after authenticating on AWS.