This repository contains Juliet C/C++ test suite for dynamic tools. You can measure True Positive and True Negative rates for your dynamic tool. This repository was used to measure results for "Symbolic Security Predicates: Hunt Program Weaknesses" paper. However, feel free to support your own tool.
$ sudo apt install clang-10 gcc-multilib
Building all CWEs may require a long time. Building required CWE to measure Sydr:
$ CC=clang-10 CXX=clang++-10 make -j15 CWE121 CWE122 CWE124 CWE126 CWE127 CWE190 CWE191 CWE194 CWE195 CWE369 CWE680
usage: test_juliet.py [-h]
[-c [CWE_NUM_1 CWE_NUM_2 ... [CWE_NUM_1 CWE_NUM_2 ... ...]]]
[-t TOOL] [-e] [-d] [-r] [-j THREADS]
optional arguments:
-h, --help show this help message and exit
-c [CWE_NUM_1 CWE_NUM_2 ... [CWE_NUM_1 CWE_NUM_2 ... ...]], --cwe [CWE_NUM_1 CWE_NUM_2 ... [CWE_NUM_1 CWE_NUM_2 ... ...]]
Run specified CWE.
-t TOOL, --tool TOOL Tool name.
-e, --error Print false positive and false negative tests.
-d, --delete Delete results and collect them again.
-r, --reproduce Recalculate statistics and reproduce sanitizers
verification.
-j THREADS Set number of threads.
You can measure TPR and TNR for your tool via test_juliet.py
script, which
does the following:
- The script runs your tool on all Juliet test cases.
- Each test case is executed on a valid input that does not lead to an error.
- Your tool should generate new inputs that may trigger an error.
- The script assigns TP to positive test case if at least one input was generated. The script assigns FP to negative test case if at least one input was generated. The script assigns TN to negative test case if no inputs were generated. The script assigns FN to positive test case if no inputs were generated.
- Afterwards, the script verifies TP cases on sanitizers. TP stays for test case if sanitizers signal an error for at least one generated input, FN is assigned to the test case otherwise.
The following script runs Sydr on all Juliet test cases, determines classes (TP, FP, TN, FN) for each test case, and verifies generated inputs on sanitizers:
$ ./test_juliet.py -j4
The script will generate results/stats.json
file containing classes (TP, FP,
TN, FN) and sanitizers verification result for each checked Juliet test case.
Moreover, script prints overall TPR, TNR, and ACC before/after sanitizers verification.
The sequential runs of test_juliet.py
will only print the already obtained
statistics.
The following command prints all False Positive and False Negative cases:
$ ./test_juliet.py -e
Save statistics to file (results must be already collected):
$ ./test_juliet.py > stats.txt
Generate LaTex file:
$ ./table.py > stats.tex
Building pdf:
$ pdflatex stats.tex
Generating svg:
$ pdf2svg stats.pdf stats.svg
If you do not acquire Sydr, you can reproduce statistics from raw results:
- Extract results archive in Juliet root:
$ tar xf results.tar.xz
- The following script will remove
results/stats.json
and rerun sanitizers verification:
$ ./test_juliet.py --reproduce -j4
If you want to generate LaTex table, see section above.
You just need to implement run_TOOLNAME
function in test_juliet.py
. This
function accepts path to a single Juliet test case binary, CWE enum for the test
case, and path to corresponding input file containing stdin. The function should
return list of new generated inputs by your tool. Then just run:
$ ./test_juliet.py -j4 -t TOOLNAME
We evaluated Sydr security predicates on Juliet test suite. See results below:
@article{vishnyakov21,
title = {Symbolic Security Predicates: Hunt Program Weaknesses},
author = {Vishnyakov, Alexey and Logunova, Vlada and Kobrin, Eli and Kuts,
Daniil and Parygina, Darya and Fedotov, Andrey},
booktitle = {2021 Ivannikov ISPRAS Open Conference (ISPRAS)},
year = {2021},
publisher = {IEEE},
url = {https://arxiv.org/abs/2111.05770},
}