Skip to main content

CloudWatch

Rationale#

We use CloudWatch for monitoring all our AWS infrastructure. It allows us to monitor our applications, react to performance changes within those applications, optimize resource utilization, and get a unified view of operational health.

The main reasons why we chose it over other alternatives are:

  1. It is a core AWS service. Once one starts creating infrastructure, CloudWatch begins to monitor it.
  2. It seamlessly integrates with most AWS services. Some examples are EC2, S3, and DynamoDB.
  3. It complies with several certifications from ISO and CSA. Many of these certifications are focused on granting that the entity follows best practices regarding secure cloud-based environments and information security.
  4. It supports custom dashboards for visualizing metrics using diagrams like bars, pies, numbers, among others. Other customizations like timespans and using resource metrics as axes are also available.
  5. It supports alarms using AWS SNS, allowing to trigger email notifications when resource metric conditions are not met or anomailes are detected.
  6. Resources can be written as code using Terraform.

Alternatives#

  1. GCP Cloud Monitoring: It did not exist at the time we migrated to the cloud. Pending to review.
  2. Azure Monitor: It did not exist at the time we migrated to the cloud. Pending to review.

Usage#

We use CloudWatch for monitoring:

  1. EC2 instance performance.
  2. EBS disk usage and performance.
  3. S3 bucket size and object number.
  4. Elastic load balancing load balancer performance.
  5. Redshift database usage and performance.
  6. Redis cache cluster usage and performance.
  7. DynamoDB tables usage and performance.
  8. SQS sent, delayed, received and deleted messages.
  9. ECS cluster resource reservation and utilization.
  10. Lambda invocations, errors, duration, among others.

We do not use CloudWatch for:

  1. Synthetic monitoring: We use Checkly instead.
  2. Service lens: It only supports Lambda functions, API Gateway, and Java-based applications.
  3. Contrinutor insights: We use Cloudflare instead.
  4. Container insights We use New Relic. Pending to review.
  5. Lambda insights: We currently use Lambda for a few non-critical tasks.
  6. Cloudwatch agent: It could increase visibility for EC2 machines. Pending to review.
  7. Cloudwatch application insights: It only supports Java-based applications.
  8. Writing our alarms as code using Terraform. Pending to do.

Guidelines#

  1. You can access the CloudWatch console after authenticating on AWS.
  2. You can watch CloudWatch metrics from the monitoring section of each AWS service.