Named Entity Recognition for Telugu using LSTM-CRF

The code for the paper titled "Named Entity Recognition for Telugu using LSTM-CRF". Please cite this paper if you use this code:

@inproceedings{reddy2018named,
  title={Named Entity Recognition for Telugu using LSTM-CRF},
  author={Reddy, Aniketh Janardhan and Adusumilli, Monica and Gorla, Sai Kiranmai and Neti, Lalita Bhanu Murthy and Malapati, Aruna},
  booktitle={WILDRE4--4th Workshop on Indian Language Data: Resources and Evaluation},
  pages={6},
  year={2018}
}

The dataset can be found in the data/Gold_Data_Telugu folder. The code for reproducing the results is in the lstmcrf folder.

Steps to reproduce LSTM-CRF results:

Download fastText pre-trained word vectors for Telugu from https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md and put them in a folder called vectors in the data directory (ie. in data/vectors).
Run the build_data.py file which generates the vocabulary and the directory structure
Train the model by running train.py
Get the model's predictions on the test set by executing predict_test.py
Run the evaluation script in the conll_evaluation folder by executing "perl conll < ../data/LSTM-CRF/predictions/predictions_9-no-dev.txt". The values of the various metrics will be displayed.

Steps to reproduce YamCha results:

Run "./configure"
Run "make"
Execute "sudo make install"
Execute 'make CORPUS=../data/Gold_Data_Telugu/train_sentences_9_IOB.txt MODEL=mon_project train SVM_PARAM="-t1 -d2 -c1" train' in the yamcha folder to train the model.
Execute 'yamcha -m mon_project.model < ../data/Gold_Data_Telugu/test_sentences_9_IOB.txt > ../data/YamCha/results9_IOB.txt' to get the predictions of the model of the test set.
Run the evaluation script in the conll_evaluation folder by executing "perl conll < ../data/YamCha/results9_IOB.txt". The values of the various metrics will be displayed.

Steps to reproduce CRF++ results:

Run "./configure"
Run "make"
Execute "sudo make install"
To train the model, run "./crf_learn -f 3 -c 1.5 template ../data/Gold_Data_Telugu/train_sentences_9_IOB.txt model"
To get the predictions of the model of the test set, run "./crf_test -m model ../data/Gold_Data_Telugu/test_sentences_9_IOB.txt > ../data/CRF++/results9_IOB.data"
Run the evaluation script in the conll_evaluation folder by executing "perl conll < ../data/CRF++/results9_IOB.data". The values of the various metrics will be displayed.

Most of the LSTM-CRF code is derived from https://github.com/guillaumegenthial/sequence_tagging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Named Entity Recognition for Telugu using LSTM-CRF

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
CRF++		CRF++
conll_evaluation		conll_evaluation
data		data
lstmcrf		lstmcrf
yamcha		yamcha
README.md		README.md
train_test_split.py		train_test_split.py

anikethjr/NER_Telugu

Folders and files

Latest commit

History

Repository files navigation

Named Entity Recognition for Telugu using LSTM-CRF

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages