Skip to main content

Checkly

Rationale

Checkly is our monitoring-as-code (MAC) tool for check-based tracking across our products. As a pivotal part of our continuous monitoring strategy, Checkly allows us to define and manage periodical testing for each one of our components to be aware of its state every time.

The main reasons why we chose it over other alternatives are:

  1. It provides a global network of testing locations, allowing you to simulate user interactions from various geographical locations.
  2. It Supports real browser checks, allowing you to test our web applications in a way that closely mimics user interactions.
  3. All the monitoring setup and tests can be configured and managed using Terraform.
  4. It provides dedicated API testing.
  5. It brings detailed metrics and insights into the performance of our products, helping to identify trends and areas for improvement.
  6. It provides a public dashboard that showcases our component's health and performance metrics. This feature is valuable for enhancing transparency and communication.
  7. It brings various possibilities for integrations, allowing us to connect different elements of the stack to keep track of failures, inform customers about the state of our components and bring fast solutions to any issue.
  8. Incorporates the option to implement a sophisticated retry strategy to minimize false positives during monitoring.

Alternatives

The following alternatives were considered but not chosen for the following reasons:

  1. Atatus Synthetic Monitoring: It does not have the support to configure checks and tests with Terraform. Also, the checks are not very customizable and do not provide good support for integrations with other services or tools.
  2. BetterStack: It does not have checks that interact directly with the browser; it does not allow running the tests in multiple browsers or the option for running the checks in parallel.
  3. DataDog: Since the monitoring options do not provide greater value than Checkly, using this tool only for this purpose is more expensive than its alternatives.

Usage

We use Checkly for:

  1. Set and run browser checks for our Docs, Platform, and Web page.
  2. Set and run API checks for our API and the DevSecOps Agent.
  3. Set and manage checks groups.
  4. Define retry strategies to avoid false positive check failures.
  5. Generate automatic incidents on our Statuspage when a check fails. Refer to the Status section for more information about this process.
  6. Set and display the public dashboard of our component's health.
  7. Notify the staff of any check failure through various channels.

We do not use Checkly for:

  1. Heartbeats: This is a new Checkly feature, and it has yet to be tested by the team.
  2. Maintenance Window: Maintenance periods are not needed due to our deployment frequency and workflow.

Guidelines

  1. Any changes to Checkly's infrastructure must be done via Merge Requests modifying its Terraform module.
  2. To learn how to test and apply infrastructure via Terraform, visit the Terraform Guidelines.