A set of statistical metrics to better understand and qualify malware datasets.
Médéric Hurier e62a24e59f Update 'LICENSE.txt' 3 months ago
.gitignore Initial commit 2 years ago
LICENSE.txt Update 'LICENSE.txt' 3 months ago
README.md Update README.md 2 years ago
ouroboros.py Initial Commit 2 years ago
output.json Initial Commit 2 years ago
requirements.txt Initial Commit 2 years ago
sample.csv.gz Initial Commit 2 years ago
stase.py Initial Commit 2 years ago


What is STASE ?

STASE provides a set of metrics to describe a dataset of malware labels.


  • evaluate the properties of malware datasets
  • identify potential bias in experimental studies
  • analyze the decision and classification of antivirus products


Input: a dataset of labels formatted as a CSV or CSV.GZ file

  • columns: antivirus products
  • rows: malware files

Output: metrics introduce in this research paper (soon to be released)


python3 stase.py sample.csv.gz output.json

    "equiponderance": 0.2422919148,

Technical details:

  • implemented in Python 3 (dependencies in requirements.txt)
  • use multiprocessing for performance
  • shipped with Ouroboros


  • Handle more input formats and options

Pull request accepted !