This repo is a python implementation of smart contract vulnerability detection using graph neural networks. In this research work, we focus on detecting three kinds of smart contract vulnerabilities (i.e., reentrancy, timestamp dependence, and infinite loop). All of the infinite loop types we concerned are implemented by Class C/C++ of VNT, while the smart contracts of reentrancy and timestamp dependence are wirrten by Solidity, i.e., Ethereum smart contract.
The ESC dataset consists of 40,932 smart contracts from Ethereum. Among the functions, around 5,013 functions possess at least one invocation to call.value, making them potentially affected by the reentrancy vulnerability. Around 4,833 functions contain the block.timestamp statement, making them susceptible to the timestamp dependence vulnerability. The VSC dataset contains all the available 4,170 smart contracts collected from the VNT Chain network, which overall contain 13,761 functions. VNT Chain is an experimental public blockchain platform proposed by companies and universities from Singapore, China, and Australia.
- python3
- TensorFlow1.14.0
- keras2.2.4 with TensorFlow backend
- sklearn for model evaluation
- docopt as a command-line interface parser
- go-vnt as a vntchain platform support
- go-ethereum as a ethereum platform support
Run the following script to install the required packages.
pip install --upgrade pip
pip install --upgrade tensorflow
pip install keras
pip install scikit-learn
pip install docopt
For each dataset, we randomly pick 80% contracts as the training set while the remainings are utilized for the testing set. In the comparison, metrics accuracy, recall, precision, and F1 score are all involved. In consideration of the distinct features of different platforms, experiments on reentrancy vulnerability and timestamp dependence vulnerability are conducted on the ESC dataset, while experiments on infinite loop vulnerability detection are conducted on the VSC dataset.
Original smart contract source code stored in google drive:
Ethereum smart contracts: Etherscan_contract
Vntchain smart contacts: Vntchain_contract
Here, we provide a tool for crawling source code from Etherscan, which is developed in Aug 2018. If out of date, you can refer and make the corresponding improvements.
All of the smart contract source code, graph data, and training data in these folders in the following structure respectively.
${GGNNSmartVulDetector}
├── data
│ ├── loops
│ │ └── source_code
│ │ └── graph_data
│ ├── timestamp
│ │ └── source_code
│ │ └── graph_data
│ └── reentrancy
│ └── source_code
│ └── graph_data
├── features
├── loops
├── timestamp
└── reentrancy
├── train_data
├── loops
│ └── train.json
│ └── vaild.json
├── timestamp
│ └── train.json
│ └── vaild.json
└── reentrancy
└── train.json
└── vaild.json
data/reentrancy/source_code
: This is the data of original smart contracts.data/reentrancy/graph_data
: This is the graph data, consisting edges and nodes, which are extracted by our AutoExtractor.graph_data/edge
: It includes all edges and edge of each smart contract.graph_data/node
: It includes all nodes and node of each smart contract.features/reentrancy
: It includes all the reentrancy features of each smart contract extracted by our model.train_data/reentrancy/train.json
: This is the training data of all the smart contract.train_data/reentrancy/valid.json
: This is the testing data of all the smart contract.
The tools and models are as follows:
${GGNNSmartVulDetector}
├── tools
│ ├── remove_comment.py
│ ├── construct_fragment.py
│ ├── reentrancy/AutoExtractGraph.py
│ └── reentrancy/graph2vec.py
AutoExtractGraph.py
- All functions in the smart contract code are automatically split and stored.
- Find the relationships between functions.
- Extract all smart contracts source code into features of nodes and edges.
python AutoExtractGraph.py
graph2vec.py
- Feature ablation.
- Converts graph into vectors.
python graph2vec.py
- To run the program, use this command: python GNNSCModel.py.
Examples:
python GNNSCModel.py --random_seed 9930 --thresholds 0.45
We would like to point that the data processing code is available here. For the complete codebase, please email to [email protected]. And, the code is adapted from GGNN. Technical questions can be addressed to [email protected], [email protected], and [email protected].
@inproceedings{ijcai2020-454,
title = {Smart Contract Vulnerability Detection using Graph Neural Network},
author = {Zhuang, Yuan and Liu, Zhenguang and Qian, Peng and Liu, Qi and Wang, Xiang and He, Qinming},
booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
Artificial Intelligence, {IJCAI-20}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
pages = {3283--3290},
year = {2020},
}
- VNT Document. vnt-document.
- Li Y, Tarlow D, Brockschmidt M, et al. Gated graph sequence neural networks. ICLR, 2016. GGNN