Some issue about reproduce the project #7

HideLakitu · 2024-10-12T16:35:28Z

Hello,
Recently I try to take kind of realistic experiment, just find this repo and think looks good for its lightness: like total line of codes, small size of used dataset (so the pipeline seems easy). Now I plan to carry out FLAD on Google Colab with drive, cause I'm Windows rather than Linux(only practiced a little bit about some basic command by vitural machine before).

So about the Traffic pre-processing part, u give instruction python3 lucid_dataset_parser.py --dataset_type DOS2019 --dataset_folder /path_to/dataset_folder/ --packets_per_flow 10 --dataset_id DOS2019 --traffic_type all --time_window 10, which means I should also download the before released repo lucid right? then use the related generated files for functions in FLAD to train.

Also I wonder is it possible to implement totally on online's Colab rather than local.

The text was updated successfully, but these errors were encountered:

doriguzzi · 2024-11-06T11:50:03Z

Hi,
yes, you need the LUCID parser to convert the traffic traces of the dataset into traffic samples. Once done, you won't need the LUCID's code anymore.
I think both LUCID and FLAD can be executed on Colab, but I've never tried so far.

All the best,
Roberto

HideLakitu · 2024-11-07T17:56:15Z

Hi, yes, you need the LUCID parser to convert the traffic traces of the dataset into traffic samples. Once done, you won't need the LUCID's code anymore. I think both LUCID and FLAD can be executed on Colab, but I've never tried so far.

All the best, Roberto

Thx for reply! I do managed to start training with those subfolders, but now I stuck on preprocess the dataset part. I mean when I try to utilize the customized data created by myself, ran command python3 lucid_dataset_parser.py --dataset_type DOS2019 --dataset_folder /path_to/dataset_folder/ --packets_per_flow 10 --dataset_id DOS2019 --traffic_type all --time_window 10 and python3 lucid_dataset_parser.py --preprocess_folder /path_to/dataset_folder/, there are only scattered data in both train, val and test .hdf5 file, significant mismatch between the scale of inputs and outputs in other word.

Specifically after downloaded CIC-2019(part data of it), the raw files are just like SAT-03-11-2018_03 , so I added .pcap as suffix, used Wireshark to open, then exported partial data of them.

For example, below one is the output of randomly choose 80,000 traffic data to preprocess, u can see only single digit of data here generated: Train/Val/Test sizes: (9,1,2) . But meantime the intermediate .data process file seems normal, for its size is proportional to the original traffic data.

So how to deal with this? Like modify which part of lucid_dataset_parser.py to handle. In normal cases I think 80,000 data should corresponds at least 2k for generated training file (just an estimate).

doriguzzi · 2024-12-10T13:39:38Z

Hi,
if you take a look at the output of command python3 lucid_dataset_parser.py --dataset_type DOS2019 --dataset_folder /path_to/dataset_folder/ --packets_per_flow 10 --dataset_id DOS2019 --traffic_type all --time_window 10, you will notice that there are only 6 benign flows. Therefore, when you execute the second step, the balancing method reduces the number of DDoS flows from 38837 to 6 to create a balanced dataset.
You can notice that the number of flows goes from ((tot,ben,ddos):(38843,6,38837)) to ((tot,ben,ddos):(12,6,6)).
To solve this issue, I suggest adding some pcaps with benign traffic in the same folder and restarting the whole process from scratch.

HideLakitu · 2024-12-13T08:53:34Z

Hi, if you take a look at the output of command python3 lucid_dataset_parser.py --dataset_type DOS2019 --dataset_folder /path_to/dataset_folder/ --packets_per_flow 10 --dataset_id DOS2019 --traffic_type all --time_window 10, you will notice that there are only 6 benign flows. Therefore, when you execute the second step, the balancing method reduces the number of DDoS flows from 38837 to 6 to create a balanced dataset. You can notice that the number of flows goes from ((tot,ben,ddos):(38843,6,38837)) to ((tot,ben,ddos):(12,6,6)). To solve this issue, I suggest adding some pcaps with benign traffic in the same folder and restarting the whole process from scratch.

THX for reply, issue has already been solved, it's due to exactly almost without the benign traffic in pcap file, and there won't happen only generate single number of training data after preprocess or out of bound(AxisError) now --- BTW I can't figure out this bug when almost consist of ddos traffic a little bit.

Feel a little embarrassed to query u again, usually in a segment of CIC2019 I found there are too little benign ones , for checked some SAT-03-11-2018_023-xx, only a few hundred benign traffic out of hundreds of thousands in a file. So can I just capture some random traffic under pcap format on my own? For all the ones you capture yourself in real-time, almost all should be benign, then merge it with file full of attack traffic.

And if this make sense, should I rewrite Source and Destination to specified IP address u mentioned in LUCID?

DOS2019_FLOWS = {'attackers': ['172.16.0.5'], 'victims': ['192.168.50.1', '192.168.50.4']}

Here are two screenshots, above is the benign file in sample-dataset in FLAD; Below is my DIY, just press the capture button in Wireshark (under WLAN port), catptured data and exported as .pcap. I didn't apply filter, like protocol, length limitation or something else.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some issue about reproduce the project #7

Some issue about reproduce the project #7

HideLakitu commented Oct 12, 2024 •

edited

Loading

doriguzzi commented Nov 6, 2024

HideLakitu commented Nov 7, 2024 •

edited

Loading

doriguzzi commented Dec 10, 2024

HideLakitu commented Dec 13, 2024 •

edited

Loading

Some issue about reproduce the project #7

Some issue about reproduce the project #7

Comments

HideLakitu commented Oct 12, 2024 • edited Loading

doriguzzi commented Nov 6, 2024

HideLakitu commented Nov 7, 2024 • edited Loading

doriguzzi commented Dec 10, 2024

HideLakitu commented Dec 13, 2024 • edited Loading

HideLakitu commented Oct 12, 2024 •

edited

Loading

HideLakitu commented Nov 7, 2024 •

edited

Loading

HideLakitu commented Dec 13, 2024 •

edited

Loading