pyterrier-quality

Content quality estimation with PyTerrier.

Installation

pip install git+https://github.com/terrierteam/pyterrier-quality

Models & Artifacts

The following pre-trained QualT5 models are available:

Model ID	Base Model
`pyterrier-quality/qt5-tiny`	`google/t5-efficient-tiny`
`pyterrier-quality/qt5-small`	`t5-small`
`pyterrier-quality/qt5-base`	`t5-base`

You can load these models using:

from pyterrier_quality import QualT5
model = QualT5('pyterrier-quality/qt5-tiny') # or another Model ID

The following cached quality scores for the following datasets are also available:

Quality Model	Dataset	Cache ID
`qt5-base`	`msmarco-passage`	`pyterrier-quality/qt5-base.msmarco-passage.cache`
`qt5-small`	`msmarco-passage`	`pyterrier-quality/qt5-small.msmarco-passage.cache`
`qt5-tiny`	`msmarco-passage`	`pyterrier-quality/qt5-tiny.msmarco-passage.cache`
(random)	`msmarco-passage`	`pyterrier-quality/rand.msmarco-passage.cache`
ITN	`msmarco-passage`	`pyterrier-quality/itn.msmarco-passage.cache`
CDD	`msmarco-passage`	`pyterrier-quality/cdd.msmarco-passage.cache`
EPIC (Qual.)	`msmarco-passage`	`pyterrier-quality/epic.msmarco-passage.cache`
TAS-B (Mag.)	`msmarco-passage`	`pyterrier-quality/tasb-mag.msmarco-passage.cache`
t5-base (Ppl.)	`msmarco-passage`	`pyterrier-quality/t5-ppl.msmarco-passage.cache`
gpt2 (Ppl.)	`msmarco-passage`	`pyterrier-quality/gpt2-ppl.msmarco-passage.cache`
`qt5-tiny`	`cord19`	`pyterrier-quality/qt5-tiny.cord19.cache`
(random)	`cord19`	`pyterrier-quality/rand.cord19.cache`
`qt5-tiny`	`msmarco-passage-v2`	`pyterrier-quality/qt5-tiny.msmarco-passage-v2.cache`
(random)	`msmarco-passage-v2`	`pyterrier-quality/rand.msmarco-passage-v2.cache`

You can load a cache using:

from pyterrier_quality import QualCache
cache = QualCache.from_url('hf:pyterrier-quality/qt5-tiny.msmarco-passage.cache') # or another Cache ID (note the hf: prefix)

For convenience, specifying the @quantiles branch on any of the caches provides a version of the quality scores converted into the corresponding quantile score. For example:

from pyterrier_quality import QualCache
cache = QualCache.from_url('hf:pyterrier-quality/qt5-tiny.msmarco-passage.cache@quantiles')

The following indexes are available, based on the quality scores above:

Quality Model	Dataset	PISA (BM25)	PISA (SPLADE (lg))
`qt5-tiny`	`msmarco-passage`	`qt5-tiny.msmarco-passage.pisa`	`qt5-tiny.msmarco-passage.splade-lg.pisa`
(random)	`msmarco-passage`	`rand.msmarco-passage.pisa`
`qt5-tiny`	`cord19`	`qt5-tiny.cord19.pisa`	`qt5-tiny.cord19.splade-lg.pisa`
(random)	`cord19`	`rand.cord19.pisa`
`qt5-tiny`	`msmarco-passage-v2`	`qt5-tiny.msmarco-passage-v2.pisa`	`qt5-tiny.msmarco-passage-v2.splade-lg.pisa`
(random)	`msmarco-passage-v2`	`rand.msmarco-passage-v2.pisa`

Using a Quality Filter in an Indexing Pipeline

QualT5 and Filter classes can be used in a PyTerrier indexing pipeline, allwowing use with Terrier, PISA, Dense, ColBERT, or SPLADE indexers, for example:

from pyterrier_quality import QualT5, Filter
qmodel = QualT5('pyterrier-quality/qt5-tiny')

pipe = qmodel >> Filter(0.8) >> splade_indexer
pipe.index(corpus)

Citation

This repository is for the paper Neural Passage Quality Estimation for Static Pruning at SIGIR 2024. If you use this work, please cite:

@inproceedings{DBLP:conf/sigir/ChangMMM24,
  author       = {Xuejun Chang and
                  Debabrata Mishra and
                  Craig Macdonald and
                  Sean MacAvaney},
  title        = {Neural Passage Quality Estimation for Static Pruning},
  booktitle    = {Proceedings of the 47th International {ACM} {SIGIR} Conference on
                  Research and Development in Information Retrieval, {SIGIR} 2024},
  publisher    = {{ACM}},
  year         = {2024},
  url          = {https://doi.org/10.1145/3626772.3657765},
  doi          = {10.1145/3626772.3657765}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
figures		figures
img		img
pyterrier_quality		pyterrier_quality
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyterrier-quality

Installation

Models & Artifacts

Using a Quality Filter in an Indexing Pipeline

Citation

About

Releases

Packages

Contributors 2

Languages

License

terrierteam/pyterrier-quality

Folders and files

Latest commit

History

Repository files navigation

pyterrier-quality

Installation

Models & Artifacts

Using a Quality Filter in an Indexing Pipeline

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages