Skip to content

baishuotong/AIPL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AIPL

Here is the relevant open-source code for the article titled “Improving Issue-PR Link Prediction via Knowledge-aware Heterogeneous Graph”

Introduction

In this work, we designe an approach, named AIPL, capable of predicting Issue-PR links on GitHub. It leverages the heterogeneous graph to model multi-type GitHub data and employ the metapath-based technology to incorporate crucial information transmitting among multi-type data. When given a pair of an issue and a PR, AIPL can suggest whether there could be a link.

Environment

AIPL is implemented by PyTorch over a server equipped with NVIDIA GTX 1060 GPU.

Dependencies

  • python 3.7.3
  • PyTorch 1.13.1
  • NumPy 1.21.5
  • Pandas 1.3.4
  • scikit-learn 1.0.2
  • scipy 1.7.3
  • DGL 0.6.1
  • NetworkX 2.6.3

File Introduction

Dataset

We release our annotated dataset in this file dir.

facebook/react & vuejs/vue

Annotated dataset based on repositories facebook_react and vuejs/vue

  • Index Information of nodes and edges on heterogeneous graph
  • Features Embeddings of nodes
  • Training set& Test set Annotated dataset
  • adjM.npz The adjacency matrix of heterogeneous graph

Note that, all the files regarding metapaths are so big that it's hard to upload them to this open-source repository. However, all the required files can be obtained by running the file construct_metapath.py.

Code

  • baseline The code of our baselines, including iLinker, A-M, random walk, metapath2vec, R-GCN, GTN, Simple-HGN, HGT, HAN, Sehgnn, and MECCH.
  • AIPL The code of AIPL, please read the following introduction for a better understanding.

Code Functions

The relevant codes of our method include building heterogeneous graph, constrcuting metapath and training graph-based model.
The first step is to run build_graph.py . The second step is to run construct_metapath.py. The third strp is to run AIPL_main.py
The detailed explanations are as follows:
build_graph.py
The code snippet constructs a heterogeneous graph and generates node features for users, repositories (repos), issues, and pull requests (PRs).
It loads data related to various relationships like user-repo, user-issue, user-PR, repo-repo, repo-issue, repo-PR, issue-issue, issue-PR, and PR-PR from corresponding directories and creates an adjacency matrix (adjM).
Additionally, it extracts feature vectors such as title vectors from CSV files to create features for repos, issues, and PRs.
The code then saves the adjacency matrix and node features in numpy arrays for further analysis.
construct_metapath.py
The code first loads data from various edge and index files, including user-repo, user-issue, user-pr, repo-repo, repo-issue, repo-pr, issue-issue, issue-pr, and pr-pr.
It then loads adjacency matrices and organizes them into lists based on different node types such as users, repositories, issues, and prs.
Next, the code generates expected metapaths based on predefined patterns. These metapaths are then mapped to corresponding indices and stored in pickle files, numpy arrays, and adjacency lists for further analysis and processing.
AIPL_main.py
The code is related to the model training and model inferences. User can train and evaluate AIPL by running AIPL_main.py.
The script handles data loading, model setup, training with early stopping, and evaluation using metrics like accuracy, precision, recall, and F1-score.
Specifically, the functions of loading data and batching are called using the files data.py, preprocess.py, and tools.py in the 'utils' folder.
Regarding the construction of the AIPL model, it includes intra-metapath aggregation, inter-metapath aggregation, and attention mechanism.
These codes are presented in the base_magnn.py and magnn_lp.py under the 'magnn_model' directory, directly called by 'AIPL_main'."

Also, you can set the series of parameters in this py file, including learning_rate, epoch_number, drop_out, attention head number, instance encoder.

Example Presentation

  1. Example 1 image
  2. Example 2 image
  3. Example 3 image

Copyright

All copyright of the tool is owned by the author of the paper.

About

open-source code&dataset for approach AIPL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published