AIPL

Here is the relevant open-source code for the article titled “Improving Issue-PR Link Prediction via Knowledge-aware Heterogeneous Graph”

Introduction

In this work, we designe an approach, named AIPL, capable of predicting Issue-PR links on GitHub. It leverages the heterogeneous graph to model multi-type GitHub data and employ the metapath-based technology to incorporate crucial information transmitting among multi-type data. When given a pair of an issue and a PR, AIPL can suggest whether there could be a link.

Environment

AIPL is implemented by PyTorch over a server equipped with NVIDIA GTX 1060 GPU.

Dependencies

python 3.7.3
PyTorch 1.13.1
NumPy 1.21.5
Pandas 1.3.4
scikit-learn 1.0.2
scipy 1.7.3
DGL 0.6.1
NetworkX 2.6.3

File Introduction

Dataset

We release our annotated dataset in this file dir.

facebook/react & vuejs/vue

Annotated dataset based on repositories facebook_react and vuejs/vue

Index Information of nodes and edges on heterogeneous graph
Features Embeddings of nodes
Training set& Test set Annotated dataset
adjM.npz The adjacency matrix of heterogeneous graph

Note that, all the files regarding metapaths are so big that it's hard to upload them to this open-source repository. However, all the required files can be obtained by running the file construct_metapath.py.

Code

baseline The code of our baselines, including iLinker, A-M, random walk, metapath2vec, R-GCN, GTN, Simple-HGN, HGT, HAN, Sehgnn, and MECCH.
AIPL The code of AIPL, please read the following introduction for a better understanding.

Code Functions

The relevant codes of our method include building heterogeneous graph, constrcuting metapath and training graph-based model.
The first step is to run build_graph.py . The second step is to run construct_metapath.py. The third strp is to run AIPL_main.py
The detailed explanations are as follows:
build_graph.py
The code snippet constructs a heterogeneous graph and generates node features for users, repositories (repos), issues, and pull requests (PRs).
It loads data related to various relationships like user-repo, user-issue, user-PR, repo-repo, repo-issue, repo-PR, issue-issue, issue-PR, and PR-PR from corresponding directories and creates an adjacency matrix (adjM).
Additionally, it extracts feature vectors such as title vectors from CSV files to create features for repos, issues, and PRs.
The code then saves the adjacency matrix and node features in numpy arrays for further analysis.
construct_metapath.py
The code first loads data from various edge and index files, including user-repo, user-issue, user-pr, repo-repo, repo-issue, repo-pr, issue-issue, issue-pr, and pr-pr.
It then loads adjacency matrices and organizes them into lists based on different node types such as users, repositories, issues, and prs.
Next, the code generates expected metapaths based on predefined patterns. These metapaths are then mapped to corresponding indices and stored in pickle files, numpy arrays, and adjacency lists for further analysis and processing.
AIPL_main.py
The code is related to the model training and model inferences. User can train and evaluate AIPL by running AIPL_main.py.
The script handles data loading, model setup, training with early stopping, and evaluation using metrics like accuracy, precision, recall, and F1-score.
Specifically, the functions of loading data and batching are called using the files data.py, preprocess.py, and tools.py in the 'utils' folder.
Regarding the construction of the AIPL model, it includes intra-metapath aggregation, inter-metapath aggregation, and attention mechanism.
These codes are presented in the base_magnn.py and magnn_lp.py under the 'magnn_model' directory, directly called by 'AIPL_main'."

Also, you can set the series of parameters in this py file, including learning_rate, epoch_number, drop_out, attention head number, instance encoder.

Example Presentation

Example 1
Example 2
Example 3

Copyright

All copyright of the tool is owned by the author of the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
code		code
dataset		dataset
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIPL

Introduction

Environment

Dependencies

File Introduction

Dataset

facebook/react & vuejs/vue

Code

Code Functions

Example Presentation

Copyright

About

Releases

Packages

Languages

baishuotong/AIPL

Folders and files

Latest commit

History

Repository files navigation

AIPL

Introduction

Environment

Dependencies

File Introduction

Dataset

facebook/react & vuejs/vue

Code

Code Functions

Example Presentation

Copyright

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages