Skip to content

Latest commit

 

History

History
121 lines (97 loc) · 6.38 KB

README.md

File metadata and controls

121 lines (97 loc) · 6.38 KB

querysum

This repository releases the code for Coarse-to-Fine Query Focused Multi-Document Summarization. Please cite the following paper [bib] if you use this code,

Xu, Yumo, and Mirella Lapata. "Coarse-to-Fine Query Focused Multi-Document Summarization." In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3632-3645. 2020.

We consider the problem of better modeling query-cluster interactions to facilitate query focused multi-document summarization. Due to the lack of training data, existing work relies heavily on retrieval-style methods for assembling query relevant summaries. We propose a coarse-to-fine modeling framework which employs progressively more accurate modules for estimating whether text segments are relevant, likely to contain an answer, and central. The modules can be independently developed and leverage training data if available. We present an instantiation of this framework with a trained evidence estimator which relies on distant supervision from question answering (where various resources exist) to identify segments which are likely to answer the query and should be included in the summary. Our framework is robust across domains and query types (i.e., long vs short) and outperforms strong comparison systems on benchmark datasets.

Should you have any query please contact me at [email protected].

Project structure

querysum
└───requirements.txt
└───README.md
└───log  # logging files
└───src  # source files
└───data  # test sets
└───graph  # graph components for centrality estimation, e.g., sim matrix and relevance vector
└───model  # QA models for infernece
└───rank  # ranking lists from sentence-level model
└───text  # predicted summaries from sentence-level model
└───rank_passage  # ranking lists from passage-level model
└───text_passage  # predicted summaries from passage-level model

After cloning this project, use the following command to initialize the structure:

mkdir log data graph model rank text rank_passage text_passage

Trained models used in the paper can be downloaded here. Please put them under querysum/model.

Create environment

cd ..
virtualenv -p python3.6 querysum
cd querysum
. bin/activate
pip install -r requirements.txt

You need to install apex:

cd ..
git clone https://www.github.com/nvidia/apex
cd apex
python3 setup.py install

Also, you need to setup ROUGE evaluation if you have not yet done it. Please refer to this repository. After finishing the setup, specify the ROUGE path in frame/utils/config_loader.py as an attribute of PathParser:

self.rouge_dir = '~/ROUGE-1.5.5/data'  # specify your ROUGE dir

Lastly, to run the centrality module with query injection, replace the algorithms package under the standard lexrank library with our code. You can download our code here, and put it under ~/querysum. Then:

rm -rf ~/querysum/lib/python3.6/site-packages/lexrank/algorithms
unzip algorithms.zip ~/querysum/lib/python3.6/site-packages/lexrank/
rm algorithms.zip

Prepare benchmark data

Since we are not allowed to distribute DUC data, you can request DUC 2005-2007 from NIST. After acquiring the data, gather each year's clusters, summaries, and queries under data/docs, data/summary_targets and data/topics, respectively. For instance, DUC 2006's clusters, queries, summaries should be found under data/docs/2006/, data/topics/2006.sgml and data/summary_targets/2006/, respectively.

TD-QFS data can be downloaded from here. You can also use the processed version here.

After data preparation, you should have the following directory structure with the right files under each folder:

querysum
└───data
│   └───docs  # DUC clusters 
│   └───passages  # DUC passage objects for passage-level QA (generated by our code)
│   └───topics  # DUC queries
│   └───summary_targets  # DUC reference summaries
│   └───tdqfs  # documents, queries and reference summaries in TD-QFS

Run experiments

We go though the three stages in the QuerySum pipeline: retrieval, answering, and summarization.

Retrieval

In src/frame/ir/ir_tf.py,

rank_e2e()  # rank all sentences
ir_rank2records()  # filter sentences based on IR scores

Specfically, rank_e2e builds a directory under rank, e.g., rank/ir-tf-2007, which stores ranking files for each cluster. On the top of it, ir_rank2records builds a filtered record for each cluster, e.g., under rank/ir_records-ir-tf-2007-0.75_ir_conf where 0.75 is the accumulated confidence.

You can specifiy IR configs in src/frame/ir/ir_config.py.

Answering

For sentence-level QA (i.e., answer sentence selection), in src/frame/bert_qa/main.py,

run(prep=True, mode='rec')

Note that you only need to set prep=True at the first time, which calculates and saves query relevance scores for the retrieved sentences from the last module. The scores are then converted into a ranking list, from which top K sentences are selected.

You can specify QA config in src/frame/bert_qa/qa_config.py.

For passage-level QA (i.e., MRC), use src/frame/bert_passage/infer.py.

Summarization

In src/frame/centrality/centrality_qa_tfidf_hard.py, run the following methods in order (or at once):

build_components_e2e()  # build graph compnents
score_e2e()  # run Markov Chain
rank_e2e()  # rank sentences
select_e2e()  # compose summary

Specifically, build_components_e2e builds similarity matrix and query relevance vector for the selected sentences from the last step. score_e2e runs a query focused Markov Chain till convergence. rank_e2e ranks sentences considering both the saliance (stationary distribution) and redundancy. Finally, select_e2e composes summary from the ranking.

You can specifiy centrality configs in src/frame/ir/centrality_config.py.