Sorts
Sorts is the product responsible for helping End Users sort files in a Git repository by its probability of containing security vulnerabilities. It does so by using Machine Learning and producing a Model that is then used by:
- End users, through:
- Fluid Attacks internal systems, to update the priority in Integrates of the source code that Fluid Attacks Hackers audit.
Public Oath
None at the moment, Sorts is yet an experimental project.
Using Sorts
Make sure you have the following tools installed in your system:
Now you can use Sorts by calling:
$ m gitlab:fluidattacks/universe@trunk /sorts
You can use the --help flag to learn more about what Sorts can do for you.
The main Sorts function is analyzing a repository and output a file with the names and corresponding probabilities of such files being vulnerable, this can be done with the following command:
$ m gitlab:fluidattacks/universe@trunk /sorts /path/to/repository
You can also use Sorts CI/CD mode that allows you to use it in a development pipeline to analyze a commit and change the rules for merging the commit into the main branch. This can be done with the following command:
$ m gitlab:fluidattacks/universe@trunk /sorts --mode ci /platform/path/to/repository
In order to use this mode correctly you need to define a configuration file. You can check the explanation for the configuration format in the Configuration guidelines.
Architecture
Sorts uses a Machine Learning Pipeline architecture, namely, many models are trained and improved automatically as the data evolves, and the best model is chosen according to some predefined criteria set up by the Sorts Maintainer.
At all points in time, we report statistics to the Redshift cluster provided by Observes, which allows us to monitor the progress of the model over time, and intervene in the pipeline if something doesn't go as planned.
The source data is taken from the git characteristics of files at Integrates which are reviewed by human hackers to determine if they are vulnerable in order to assign them a label based on this, then the filenames are used to extract git characteristics using their repository's commit history and create the dataset that will be used to train Sorts model.
Roughly, the pipeline consists of the following steps:
/sorts/extract-features
: Whose purpose is to clone all source code repositories, and produce a CSV with the features for each file in the repository./sorts/merge-features
: Which takes the features from the previous step, and merges them into a single CSV./sorts/training-and-tune
: Who takes the CSV from the previous step and trains different models using SageMaker by Amazon Web Services.SageMaker takes the training data from the previous step and uploads the trained model to a bucket on S3 by Amazon Web Services.
Last but not least, the best model is selected automatically and uploaded to the same bucket, but in a constant location.
This Best Model is the output of the pipeline.
/sorts/execute
: Takes the "Best Model" and uses it to prioritize the files at Integrates./sorts/association*
: This is an attempt to make Sorts recommend the type of vulnerability that a file may contain as well.
You can right-click on the image below to open it in a new tab, or save it to your computer.
Contributing
Please read the contributing page first.
Development Environment
Follow the steps in the Development Environment section of our documentation.
If prompted for an AWS role, choose dev
,
and when prompted for a Development Environment, pick sorts
.