DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance

This repository is the official implementation of "DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance", which was accepted to Findings of ACL 2023

Datasets

The datasets used in our experiments can be found on Huggingface here! There are three splits in the dataset: arxiv, un, and newspop, which correspond to the three datasets used in the paper. Each dataset also has two columns:

text: The text of the document in the corpus
time_discrete: The time stamp of the document in the corpus

Requirements

This code was run on Python 3.8.10. We recommend creating a virtual environment to run DynaMiTE.

To install requirements:

pip install -r requirements.txt

Preprocessing

First, create a folder with your dataset name in the data folder (e.g. data/arxiv). To load in your dataset, create a data.csv file where each row is a document. This CSV must contain at least two columns, text and time_discrete, which correspond to the text of the document as well as an ordinal integer representing the time step of the document.

The zip file containing AutoPhrase must also be downloaded from here. This zip file should be moved into the preprocessing folder.

To preprocess the dataset, navigate to /preprocess/, specify the parameters at the top of the preprocess.py file, and run the following command:

python preprocess.py

The folder with your CSV will become populated with more data. Expect ~15min to process each dataset.

We provide links to download the Arxiv, UN, and Newspop datasets.

Training

First, navigate to train_model/train.py and specify your parameters at the top of the file. Then, navigate back to parent directory.

You can train DynaMiTE by running the following command:

python train_model/train.py

You must specify an output folder, which defaults to the results folder. You must also add the same dataset folder to the output folder specified in the preprocessing step (e.g. results/arxiv/). After training, the specified output folder will be populated with the topic evolutions in a text file, along with the embeddings from the discriminative dynamic word embedding space.

Experiments

We include code for running quantitative experiments for NPMI and the category shift analysis. Both experiments require the outputs from training.

You can calculate NPMI by running the following command:

python eval.py

You can run the category experiment through the following command:

python shift_study.py

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
eval		eval
images		images
preprocessing		preprocessing
train_model		train_model
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance

Datasets

Requirements

Preprocessing

Training

Experiments

About

Releases

Packages

Languages

nbalepur/DynaMiTE

Folders and files

Latest commit

History

Repository files navigation

DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance

Datasets

Requirements

Preprocessing

Training

Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages