SageMaker
Last updated: Feb 9, 2026
Rationale
SageMaker is the platform we use for developing solutions involving Machine Learning.
The main reasons why we chose it over other alternatives are:
- It integrates with EC2, allowing for easily provisioning of cloud computing resources. Such a feature is essential for horizontal autoscaling.
- It complies with several certifications from ISO and CSA. Many of these certifications are focused on ensuring that the entity follows best practices regarding secure cloud-based environments and information security.
- It integrates with S3, allowing us to store raw data, datasets, and training outputs in our S3 Bucket.
- It supports a wide range of EC2 ML-specific machines for training models.
- It supports EC2 spot machines, enabling a significant reduction in machine costs.
- Thanks to its horizontal autoscaling capabilities, it is very easy to implement parallelism by running several models or feature combinations on separate machines, greatly increasing training performance.
- It supports Hyperparameterization, allowing the concurrent training of multiple instances of a model with different parameter values. Such a feature is essential for optimizing our most accurate model.
- It integrates with IAM, enabling a least-privilege approach to authentication and authorization.
- It supports a wide range of frameworks, including scikit-learn, the one that Sorts uses.
- EC2 workers performance can be monitored via CloudWatch.
- Logs for training jobs can be monitored via CloudWatch.
Alternatives
- IBM Watson Studio: It does not integrate with EC2 or S3, increasing overall complexity (pending review).
- GCP Vertex AI: It does not integrate with EC2 or S3, increasing overall complexity (pending review).
- Azure machine learning: It does not integrate with EC2 or S3, increasing overall complexity (pending review).
Usage
- We use SageMaker as the Machine Learning platform for training Sorts, our ML-based software vulnerability scanner.
- We do not use SageMaker spot instances (pending implementation).