The main reasons why we chose it over other alternatives are:
- It is SaaS, as no infrastructure needs to be directly managed.
- It is free, as we only need to pay for the EC2 machines used to process workloads.
- It complies with several certifications from ISO and CSA. Many of these certifications are focused on granting that the entity follows best practices regarding secure cloud-based environments and information security.
- Job logs can be monitored using CloudWatch.
- Jobs are highly resilent, meaning that they rarely go irresponsive. This feature is very important when jobs take several days to finish.
- It supports EC2 spot instances, which considerably decreases EC2 costs.
- All its settings can be written as code using Terraform.
- We can use Nix for easily queueing jobs.
- It supports priority-based queues, allowing to prioritize jobs by assigning them to one queue or another.
- It supports job automatic retries.
- It integrates with IAM, allowing to keep a least privilege approach regarding authentication and authorization.
- EC2 workers running jobs can be monitored using CloudWatch.
Gitlab CI: We used it before implementing Batch. We migrated because Gitlab CI is not meant to run scheduled jobs that take many hours, often resulting in jobs becoming irresponsive before they could finish, mainly due to disconnections between the worker running the job and the Gitlab CI Bastion.
We use Batch for:
- Running Observes ETLs.
- Running Skims scans.
- Running Skims OWASP Benchmark.
- Running ASM Users to Entity reports.
- You can access the Batch console after authenticating on AWS.
- Any changes to Batch infrastructure must be done via Merge Requests.
- You can queue new jobs to Batch by using the compute-on-aws module.
- If a scheduled job takes longer than six hours, it generally should run in Batch, otherwise it can use the Gitlab CI.
- To learn how to test and apply infrastructure via Terraform, visit the Terraform Guidelines.