SageMaker
Rationale
SageMaker is the platform we use for developing solutions involving Machine Learning .
The main reasons why we chose it over other alternatives are:
- It integrates with EC2 , allowing for easily provisioning of cloud computing resources. Such a feature is essential for horizontal autoscaling .
- It complies with several certifications from ISO and CSA . Many of these certifications are focused on ensuring that the entity follows best practices regarding secure cloud-based environments and information security.
- It integrates with S3 , allowing us to store raw data, datasets, and training outputs in our S3 Bucket .
- It supports a wide range of EC2 ML-specific machines for training models.
- It supports EC2 spot machines , enabling a significant reduction in machine costs .
- Thanks to its horizontal autoscaling capabilities, it is very easy to implement parallelism by running several models or feature combinations on separate machines, greatly increasing training performance.
- It supports Hyperparameterization , allowing the concurrent training of multiple instances of a model with different parameter values. Such a feature is essential for optimizing our most accurate model.
- It integrates with IAM , enabling a least-privilege approach to authentication and authorization .
- It supports a wide range of frameworks , including scikit-learn , the one that Sorts uses.
- EC2 workers performance can be monitored via CloudWatch .
- Logs for training jobs can be monitored via CloudWatch .
Alternatives
- IBM Watson Studio : It does not integrate with EC2 or S3 , increasing overall complexity (pending review).
- GCP Vertex AIÂ : It does not integrate with EC2Â or S3Â , increasing overall complexity (pending review).
- Azure machine learning : It does not integrate with EC2 or S3 , increasing overall complexity (pending review).
Usage
- We use SageMaker as the Machine Learning platform for training Sorts , our ML-based software vulnerability scanner.
- We do not use SageMaker spot instances (pending implementation).
Last updated on