CloudWatch
Rationale
We use CloudWatch for monitoring our entire AWS infrastructure. We can monitor our applications, react to performance changes within them, optimize resource utilization, and get a unified view of operational health. The main reasons why we chose it over other alternatives are the following:
- It is a core AWS service. Once we start creating infrastructure, CloudWatch begins to monitor it.
- It integrates seamlessly with most AWS services. Some examples are EC2Â , S3Â , and DynamoDBÂ .
- It complies with several certifications from ISO and CSA . Many of these certifications are focused on ensuring that the entity follows best practices regarding secure cloud-based environments and information security.
- It supports custom dashboards for visualizing metrics using diagrams like bars, pies, and numbers, among others. Other customizations, such as timespans and resource metrics as axes, are also available.
- It supports alarms using AWS SNS , allowing email notifications to be triggered when resource metric conditions are not met or when anomalies are detected .
- Resources can be written as code using Terraform .
Alternatives
GCP Cloud Monitoring and Azure Monitor are alternatives that did not exist at the time we migrated to the cloud (a review of each of them is pending).
Usage
We use CloudWatch for monitoring
- EC2Â instance performance;
- EBSÂ disk usage and performance;
- S3Â bucket size and object number;
- ELBÂ load balancer performance;
- Redshift database usage and performance;
- DynamoDBÂ tables usage and performance;
- SQSÂ sent, delayed, received and deleted messages;
- ECSÂ cluster resource reservation and utilization, and
- Lambda invocations, errors, duration, among others.
We do not use CloudWatch for
- Synthetic monitoring (we use Checkly instead);
- ServiceLens (it only supports Lambda functions, API Gateway , and Java-based applications);
- Contributor Insights (we use Cloudflare instead);
- Container Insights (we use New Relic ; pending review);
- Lambda Insights (we currently use Lambda for a few non-critical tasks);
- CloudWatch agent (it could increase visibility for EC2 machines; pending review);
- CloudWatch Application Insights (it only supports Java-based applications), or
- writing our alarms as code using Terraform (pending to be done).
Last updated on