Directory Structure
.
├── data # directory with scanner results
│
├── kalm_benchmark # source code of the package
│ ├── cli.py # Command line interface
│ ├── evaluation # module for evaluation the scanner results
│ │ ├── scanner # module containing dedicated scripts per scanner for parsing its results
│ │ │ ├── ...
│ │ │ └── scanner_evaluator.py # the base for the scanner specific scripts
│ │ └── evaluation.py # the core of the evaluation module
│ ├── manifest_generator # module for generating the benchmark manifests
│ │ ├── cdk8s_imports # (generated) imports of k8s definitions generated by cdk8s
│ │ ├── constants.py # collection of constants shared across manifests
│ │ ├── gen_manifests.py # entry-point for generating the manifests
│ │ ├── ...
│ │ └── workload # definition of workload related manifests
│ └── ui # module for the visualization of the evaluation
│
├── manifests # (generated) the target directory for generated manifests
├── notebooks # folder containing all notebooks used for the analysis
│── tests # all unit-tests mirroring source code structure
└── tox.ini # tox file with settings for flake8 and running tox
The project consists of 2 main modules:
manifest_generator
: the code for the generation and management of the manifests for the benchmarkevaluation
: contains all the code for scanners and their evaluations
To install the project and all the dependencies execute:
poetry run install
After the installation the pre-commit hooks must be installed:
poetry run pre-commit install
This installs the following tools with minor adjustments:
Poetry is used for the management of this project. Thus, to install new dependency also use poetry instead of pip:
poetry add <dependency>
For unit-tests pytest is used. To run all tests enter:
poetry run pytest
The evaluation of the scanner results run through a unified pipeline. At the beginning the results of a scanner are loaded and parsed. These steps are specified to every scanner and can be customized in the respective implementation. Afterwardse, the results are structured as a table and any missing check ids are imputed. In parallel, the benchmark table is created by implicetely generating the manifests using cdk8s.
Both tables are merged on their check_id
column and the respective column containing information of the checked path
.
Finally, the outcome is post-processed by removing duplicates and missing values and categorizing the checks.
The full pipeline is as follows:
flowchart TD;
subgraph scanner
direction LR;
A[Load Scan Result] --> scan_parse[Parse Results JSON]
scan_parse --> impute[Impute Check ID]
impute --> tab_scan[Tabulate Results]
end
tab_scan --> merge{Merge <br/> dataframes}
subgraph benchmark
B[Load Benchmark] --> tab_bench[Tabulate Benchmark]
end
tab_bench --> merge
merge --> drop_redundants[Drop Redundant Checks]
drop_redundants --> drop_dups[Drop Duplicates]
drop_dups --> single_name[Unify Names]
single_name --> cat[Categorize Check]
cat --> oos[Filter Out of Scope Results]
oos --> fill_na[Fill NAs]
fill_na --> classify[Classify Benchmark Result]
classify --> Z([Done])