CloudWatch
Rationale
We use CloudWatch for monitoring our entire AWS infrastructure. We can monitor our applications, react to performance changes within them, optimize resource utilization, and get a unified view of operational health. The main reasons why we chose it over other alternatives are the following:
- It is a core AWS service. Once we start creating infrastructure, CloudWatch begins to monitor it.
- It integrates seamlessly with most AWS services. Some examples are EC2, S3, and DynamoDB.
- It complies with several certifications from ISO and CSA. Many of these certifications are focused on granting that the entity follows best practices regarding secure cloud-based environments and information security.
- It supports custom dashboards for visualizing metrics using diagrams like bars, pies, numbers, among others. Other customizations such as timespans and resource metrics as axes are also available.
- It supports alarms using AWS SNS, allowing email notifications to be triggered when resource metric conditions are not met or anomalies are detected.
- Resources can be written as code using Terraform.
Alternatives
Note: > GCP Cloud Monitoring and Azure Monitor are alternatives that did not exist at the time we migrated to the cloud. A review of each of them is pending.
Usage
We use CloudWatch for monitoring
- EC2 instance performance;
- EBS disk usage and performance;
- S3 bucket size and object number;
- ELB load balancer performance;
- Redshift database usage and performance;
- DynamoDB tables usage and performance;
- SQS sent, delayed, received and deleted messages;
- ECS cluster resource reservation and utilization, and
- Lambda invocations, errors, duration, among others.
We do not use CloudWatch for
- synthetic monitoring (we use Checkly instead);
- ServiceLens (it only supports Lambda functions, API Gateway, and Java-based applications);
- Contributor Insights (we use Cloudflare instead);
- Container Insights (we use New Relic; pending review);
- Lambda Insights (we currently use Lambda for a few non-critical tasks);
- CloudWatch agent (it could increase visibility for EC2 machines; pending review);
- CloudWatch Application Insights (it only supports Java-based applications), or
- writing our alarms as code using Terraform (pending to be done).
Guidelines
- You can access the CloudWatch console after authenticating to AWS.
- You can watch CloudWatch metrics from the monitoring section of each AWS service.