-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
28ef588
commit def7a10
Showing
5 changed files
with
68 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,41 @@ | ||
# heiDGAF - DGA Finder | ||
|
||
> ML based DNS analyzer to detect Domain Generation Algorithms (DGAs) and tunneling of malicious actors. | ||
## | ||
> ML based DNS analyzer to detect Domain Generation Algorithms (DGAs) tunneling, and data exfiltration of malicious actors. | ||
## Getting Started | ||
|
||
```sh | ||
python -m venv .venv | ||
pip install . | ||
|
||
heidgaf -h | ||
``` | ||
|
||
Run your analysis: | ||
|
||
```sh | ||
heidgaf process start -r data/... | ||
``` | ||
|
||
### Data | ||
|
||
Currently, we support the data format scheme: | ||
|
||
`{{ .timestamp }} {{ .return_code }} {{ .client_ip }} {{ .server_ip }} {{ .query }} {{ .type }} {{ .answer }} {{ .size }}b` | ||
|
||
For training our models, we rely on the following data sets: | ||
|
||
- CICBellDNS2021 | ||
- DGTA Benchmark | ||
- Majestic Million | ||
|
||
### Exploratory Data Analysis (EDA) | ||
|
||
In the folder `./example` we conducted a Exploratory Data Analysis (EDA) to verify the features of interest for our application. | ||
|
||
## Literature | ||
|
||
## Exploratory Data Analysis (EDA) | ||
Based on the following work we implement heiDGAF to find malicious behaviour in DNS request. | ||
|
||
In the folder `./example` we conducted a Exploratory Data Analysis (EDA) to verify the features of interest for our application. | ||
- EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis | ||
- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,24 @@ | ||
import polars as pl | ||
import logging | ||
|
||
from heidgaf import ReturnCode | ||
from heidgaf.pre import Analyzer | ||
|
||
|
||
class IPAnalyzer(Analyzer): | ||
def __init__(self) -> None: | ||
super().__init__() | ||
super().__init__() | ||
|
||
@classmethod | ||
def run(self, data): | ||
# Filter data with no errors | ||
df = data.filter(pl.col("query") != "|").filter(pl.col("return_code") != ReturnCode.NOERROR.value).filter(pl.col("query").str.split(".").list.len() != 1) | ||
# Get frequency count of distinct IP addresses and DNS servers | ||
client_ip_frequency = df.select([ | ||
pl.col("client_ip").value_counts() | ||
]) | ||
logging.debug(f'Client IP freq: {client_ip_frequency}') | ||
dns_server_frequency = df.select([ | ||
pl.col("dns_server").value_counts() | ||
]) | ||
logging.debug(f'Client IP freq: {dns_server_frequency}') |