python preprocess.py
preprocess.py
will generate two file,train
andtest
, which are the inputs of libSVM.- Details
- For the row written in float or int, I did normalization.
- For the row written in words, such as attack categories, I applied one hot to represent each category.
python preprocess_csv.py
preprocess_csv.py
generates two csv files,train.csv
andtest.csv
underpreprocess
directory.- Details
- Map attack category to multiclass label
- Drop rows containing NA (seems none)
- Min-max normalization
- One hot encoding
The current implementation of analyze.py
takes one or two .csv files in the format of UNSW_NB15 as input.
- For one file, comment the comparison section. Running
analyze.py
outputsanalysis
that contains the average and standard deviation of each continous data category. - For two files, running
analyze.py
outputsanalysis
that contains the aforementioned information for both files, and also the difference between the two (both by value and by value / std).
-
Running preprocess.py requires the module ordered_set If running the following command
pip install ordered_set
throws the error
AttributeError: module 'lib' has no attribute 'X509_V_FLAG_CB_ISSUER_CHECK'
, simply edit the crypto.py file mentioned in the stacktrace and remove the offending line with #. Then, if
pip install ordered_set
throws the error
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
, uninstall pip with
sudo apt remove python3-pip
and run
sudo python3 get-pip.py
Reboot and run
pip install pyopenssl --upgrade
.Then, you should be able to install ordered_set and thus run preprocess.py
-
Before training / grid search, run
apt-get update
and
apt install libsvm-tools