We use CloudWatch for monitoring all our AWS infrastructure. It allows us to monitor our applications, react to performance changes within those applications, optimize resource utilization, and get a unified view of operational health.
The main reasons why we chose it over other alternatives are:
- It is a core AWS service. Once one starts creating infrastructure, CloudWatch begins to monitor it.
- It seamlessly integrates with most AWS services. Some examples are EC2, S3, and DynamoDB.
- It complies with several certifications from ISO and CSA. Many of these certifications are focused on granting that the entity follows best practices regarding secure cloud-based environments and information security.
- It supports custom dashboards for visualizing metrics using diagrams like bars, pies, numbers, among others. Other customizations like timespans and using resource metrics as axes are also available.
- It supports alarms using AWS SNS, allowing to trigger email notifications when resource metric conditions are not met or anomailes are detected.
- Resources can be written as code using Terraform.
- GCP Cloud Monitoring: It did not exist at the time we migrated to the cloud. Pending to review.
- Azure Monitor: It did not exist at the time we migrated to the cloud. Pending to review.
We use CloudWatch for monitoring:
- EC2 instance performance.
- EBS disk usage and performance.
- S3 bucket size and object number.
- Elastic load balancing load balancer performance.
- Redshift database usage and performance.
- Redis cache cluster usage and performance.
- DynamoDB tables usage and performance.
- SQS sent, delayed, received and deleted messages.
- ECS cluster resource reservation and utilization.
- Lambda invocations, errors, duration, among others.
We do not use CloudWatch for:
- Synthetic monitoring: We use Checkly instead.
- Service lens: It only supports Lambda functions, API Gateway, and Java-based applications.
- Contrinutor insights: We use Cloudflare instead.
- Container insights We use New Relic. Pending to review.
- Lambda insights: We currently use Lambda for a few non-critical tasks.
- Cloudwatch agent: It could increase visibility for EC2 machines. Pending to review.
- Cloudwatch application insights: It only supports Java-based applications.
- Writing our alarms as code using Terraform. Pending to do.