Skip to main content

Sorts

Sorts is the product responsible for helping End Users sort files in a Git repository by its probability of containing security vulnerabilities. It does so by using Machine Learning and producing a Model that is then used by:

Public Oath

None at the moment, Sorts is yet an experimental project.

Using Sorts

  1. Make sure you have the following tools installed in your system:

  2. Now you can use Sorts by calling:

    $ m gitlab:fluidattacks/universe@trunk /sorts

    You can use the --help flag to learn more about what Sorts can do for you.

    The main Sorts function is analyzing a repository and output a file with the names and corresponding probabilities of such files being vulnerable, this can be done with the following command:

     $ m gitlab:fluidattacks/universe@trunk /sorts /path/to/repository
  3. You can also use Sorts CI/CD mode that allows you to use it in a development pipeline to analyze a commit and change the rules for merging the commit into the main branch. This can be done with the following command:

     $ m gitlab:fluidattacks/universe@trunk /sorts --mode ci /platform/path/to/repository

    In order to use this mode correctly you need to define a configuration file. You can check the explanation for the configuration format in the Configuration guidelines.

Architecture

  1. Sorts uses a Machine Learning Pipeline architecture, namely, many models are trained and improved automatically as the data evolves, and the best model is chosen according to some predefined criteria set up by the Sorts Maintainer.

    At all points in time, we report statistics to the Redshift cluster provided by Observes, which allows us to monitor the progress of the model over time, and intervene in the pipeline if something doesn't go as planned.

  2. The source data is taken from the git characteristics of files at Integrates which are reviewed by human hackers to determine if they are vulnerable in order to assign them a label based on this, then the filenames are used to extract git characteristics using their repository's commit history and create the dataset that will be used to train Sorts model.

  3. Roughly, the pipeline consists of the following steps:

    • /sorts/extract-features: Whose purpose is to clone all source code repositories, and produce a CSV with the features for each file in the repository.

    • /sorts/merge-features: Which takes the features from the previous step, and merges them into a single CSV.

    • /sorts/training-and-tune: Who takes the CSV from the previous step and trains different models using SageMaker by Amazon Web Services.

      SageMaker takes the training data from the previous step and uploads the trained model to a bucket on S3 by Amazon Web Services.

      Last but not least, the best model is selected automatically and uploaded to the same bucket, but in a constant location.

    This Best Model is the output of the pipeline.

  4. /sorts/execute: Takes the "Best Model" and uses it to prioritize the files at Integrates.

  5. /sorts/association*: This is an attempt to make Sorts recommend the type of vulnerability that a file may contain as well.

tip

You can right-click on the image below to open it in a new tab, or save it to your computer.

Architecture of Sorts

Contributing

Please read the contributing page first.

Development Environment

Follow the steps in the Development Environment section of our documentation.

If prompted for an AWS role, choose dev, and when prompted for a Development Environment, pick sorts.