Weilin Xu, Yanjun Qi, and David Evans
University of Virginia
Several external libraries are required in the project.
- A modified version of pdfrw for parsing PDF at https://github.com/mzweilin/pdfrw
- Cuckoo Sandbox v1.2 as the oracle at https://github.com/cuckoosandbox/cuckoo/releases/tag/1.2
- Target classifier PDFrate-Mimicus at https://github.com/srndic/mimicus
- Target classifier Hidost at https://github.com/srndic/hidost
Copy the template and change to your own configuration.
cp project.conf.template project.conf
vim project.conf
First start the centralized detection agent with pre-defined malware signatures.
$ ./utils/detection_agent_server.py ./utils/36vms_sigs.pickle
Second, run a program to select several benign PDF files as external genome.
$ ./utils/generate_ext_genome.py [classifier_name] [benign_sample_folder] [file_number]
Now we can start the main program ./gp.py
with a long list of arguments. The helper script ./batch.py
should be helpful in large scale experiments.
./batch.py [classifier_name] [ext_genome_folder] [round_id]
Adding more target classifiers to the framework is trivial.
- Add a wrapper in
./classifiers/
likepdfrate_wrapper.py::pdfrate()
- Implement a fitness function in
./lib/fitness.py
likefitness_pdfrate()
, and specify a switch ingp.py
- Import the wrapper function in
./utils/detection_agent_server.py
likepdfrate()
, and extendquery_classifier()
so that the main program could call the detector throughlib.detector.query_classifier()
.