HeadLine Grouping

This repository will contain the dataset and models described in NAACL2021 paper: News Headline Grouping as a Challenging NLU Task.

Example Headline Groups from the International Space Station timeline.

Dataset Releases

In the release, we provide two versions of the HLGD:

The original annotation of the 10 timelines present in HLGD, with annotator identities anonymized. For each headline, we populate: the URL of the headline (url), the headline text (headline), the publication date (date), the group annotations of 5 annotators (annot_1_group, ... annot_5_group), and an aggregate group (global_group).
The classification compatible version of HLGD containing ~20,000 headline pairs and binary labels indicating whether the headlines are in the same global group or not. The classification dataset is integrated into HuggingFace's datasets library. The dataset can be loaded in the following way:

!pip install datasets
from datasets import load_dataset
data = load_dataset('hlgd')

Note: We considered the legal component of the release of HLGD, and consider that the release of the dataset falls under fair use. See more detail here

Model Releases

We release two models:

cls_elec_base_hlgd_0.74f1.bin model corresponds to the Electra Finetune on HLGD + Time in the paper. An example use of the model is provided in model_classifier.py
gpt2med_headline_gen_1.645.bin model corresponds to the headline generator used for the Headline Generator Swap results. An example use of the model is provided in model_generator_swap.py

Cite the work

If you make use of the code, models, or algorithm, please cite our paper:

@inproceedings{Laban2021NewsHG,
  title={News Headline Grouping as a Challenging NLU Task},
  author={Laban, Philippe and Bandarkar, Lucas and Hearst, Marti A},
  booktitle={NAACL 2021},
  publisher = {Association for Computational Linguistics},
  year={2021}
}

Contributing

If you'd like to contribute, or have questions or suggestions, you can contact us at [email protected]. All contributions welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md
model_classifier.py		model_classifier.py
model_generator_swap.py		model_generator_swap.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HeadLine Grouping

Dataset Releases

Model Releases

Cite the work

Contributing

About

Releases 1

Packages

Languages

License

tingofurro/headline_grouping

Folders and files

Latest commit

History

Repository files navigation

HeadLine Grouping

Dataset Releases

Model Releases

Cite the work

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages