GraphNER

This repository contains the code for reproducing the preliminary results reported in the paper "Named Entity Recognition as Graph Classification" (currently under review for the ESWC 2021 Poster Track conference).

Overview

The code is organized as notebooks, to be used as follows:

final_generate_gazetteers.ipynb: to generate gazeteers from Wikidata (by specifying a list of QIDs corresponding to the entity types that one wishes to extract)
edge_list_generation.ipynb: to generate the graph structure to build the graph embeddings; when applied to the ConLL 2003 train dataset, one should get a similar result that this Python dict data structure
graph_embeddings_generation.ipynb: to generate node embeddings using of the algorithms (e.g. node2ve, SDNE..) provided by the GEM library
node2vec_classification.ipynb: to train a model for the node2vec embeddings
transE_classification.ipynb: to train a model for the trans-E embeddings
autoencoder_embeddings.ipynb: to generate auto-encoder embeddings from the binary graph representations
autoencoder_classification.ipynb: to train a model for the auto-encoder embeddings
GCN_classification.ipynb: to train a Graph Convolution Network (based on this architecture)

The code will be streamlined into stand-alone configurable scripts and fully documented soon.

⚠️ This code runs on a CUDA11.0-enabled GPU, please install the compatible version of the modules for your hardware.

The table below shows the best performance of different models on the validation set (dev) of CoNLL-2003

Method	Accuracy	Micro-F1	Macro-F1
Auto-encoder	91.8	91.5	71.7
Node2Vec	93.8	94.1	82.1
Trans-E	94.1	93.6	78.8
GCN	96.5	96.5	88.8

As for test set performance:

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
dataset/conll		dataset/conll
.gitignore		.gitignore
GCN_classification.ipynb		GCN_classification.ipynb
README.md		README.md
autoencoder_classification.ipynb		autoencoder_classification.ipynb
autoencoder_embeddings.ipynb		autoencoder_embeddings.ipynb
edge_list_generation.ipynb		edge_list_generation.ipynb
final_generate_gazetteers.ipynb		final_generate_gazetteers.ipynb
graph_embeddings_generation.ipynb		graph_embeddings_generation.ipynb
node2vec_classification.ipynb		node2vec_classification.ipynb
script_create_conll_dataset.ipynb		script_create_conll_dataset.ipynb
transE_classification.ipynb		transE_classification.ipynb