This repo provides model training of Graph Attention Network in Anti Money Laundering Detection problem.
Dataset: https://www.kaggle.com/datasets/ealtman2019/ibm-transactions-for-anti-money-laundering-aml
Main dependencies are NumPy, PyTorch, PyG and pandas
Use the pip to install dependencies, you may use conda instead
For PyG installation, you may use below code to install
pip install torch_geometric
Please create the corresponding folder before you run the script.
Put the .csv file into raw folder, dataset.py will create "processed" folder with processed data once you run the train.py.
Make sure the directories are created as below:
├── data
│ ├── raw
├── dataset.py
├── model.py
└── train.py
This Jupyter Notebook explains the feature engineering implemented and short summary in this repo.
It also provides the data visualization and preprocessing pipeline, as well as dataset design details.
All data preprocessing are done in dataset.py, it used torch_geometric.data.InMemoryDataset as the dataset framework.
Please note that dataset.py currently support single file processing.
Please change the path in line 8 to your local path, e.g. '/path/to/AntiMoneyLaunderingDetectionWithGNN/data'
The hyperparameters used in train.py:
epoch = 100
train_batch_size = 256
test_batch_size = 256
learning_rate=0.0001
optimizer: SGD
This repo is using Graph Attention Network as backbone model, the model can be changed in model.py
Some of the feature engineering of this repo are referenced to below papers, highly recommend to read:
- Weber, M., Domeniconi, G., Chen, J., Weidele, D. K. I., Bellei, C., Robinson, T., & Leiserson, C. E. (2019). Anti-money laundering in bitcoin: Experimenting with graph convolutional networks for financial forensics. arXiv preprint arXiv:1908.02591.
- Johannessen, F., & Jullum, M. (2023). Finding Money Launderers Using Heterogeneous Graph Neural Networks. arXiv preprint arXiv:2307.13499.