Skip to content

Code for Findings of ACL 2023 Paper "DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance"

Notifications You must be signed in to change notification settings

nbalepur/DynaMiTE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance

This repository is the official implementation of "DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance", which was accepted to Findings of ACL 2023

model name

Datasets

The datasets used in our experiments can be found on Huggingface here! There are three splits in the dataset: arxiv, un, and newspop, which correspond to the three datasets used in the paper. Each dataset also has two columns:

  • text: The text of the document in the corpus
  • time_discrete: The time stamp of the document in the corpus

Requirements

This code was run on Python 3.8.10. We recommend creating a virtual environment to run DynaMiTE.

To install requirements:

pip install -r requirements.txt

Preprocessing

First, create a folder with your dataset name in the data folder (e.g. data/arxiv). To load in your dataset, create a data.csv file where each row is a document. This CSV must contain at least two columns, text and time_discrete, which correspond to the text of the document as well as an ordinal integer representing the time step of the document.

The zip file containing AutoPhrase must also be downloaded from here. This zip file should be moved into the preprocessing folder.

To preprocess the dataset, navigate to /preprocess/, specify the parameters at the top of the preprocess.py file, and run the following command:

python preprocess.py

The folder with your CSV will become populated with more data. Expect ~15min to process each dataset.

We provide links to download the Arxiv, UN, and Newspop datasets.

Training

First, navigate to train_model/train.py and specify your parameters at the top of the file. Then, navigate back to parent directory.

You can train DynaMiTE by running the following command:

python train_model/train.py

You must specify an output folder, which defaults to the results folder. You must also add the same dataset folder to the output folder specified in the preprocessing step (e.g. results/arxiv/). After training, the specified output folder will be populated with the topic evolutions in a text file, along with the embeddings from the discriminative dynamic word embedding space.

Experiments

We include code for running quantitative experiments for NPMI and the category shift analysis. Both experiments require the outputs from training.

You can calculate NPMI by running the following command:

python eval.py

You can run the category experiment through the following command:

python shift_study.py

About

Code for Findings of ACL 2023 Paper "DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published