Content quality estimation with PyTerrier.
pip install git+https://github.com/terrierteam/pyterrier-quality
The following pre-trained QualT5 models are available:
Model ID | Base Model |
---|---|
pyterrier-quality/qt5-tiny |
google/t5-efficient-tiny |
pyterrier-quality/qt5-small |
t5-small |
pyterrier-quality/qt5-base |
t5-base |
You can load these models using:
from pyterrier_quality import QualT5
model = QualT5('pyterrier-quality/qt5-tiny') # or another Model ID
The following cached quality scores for the following datasets are also available:
You can load a cache using:
from pyterrier_quality import QualCache
cache = QualCache.from_url('hf:pyterrier-quality/qt5-tiny.msmarco-passage.cache') # or another Cache ID (note the hf: prefix)
For convenience, specifying the @quantiles
branch on any of the caches provides a version of the quality scores
converted into the corresponding quantile score. For example:
from pyterrier_quality import QualCache
cache = QualCache.from_url('hf:pyterrier-quality/qt5-tiny.msmarco-passage.cache@quantiles')
The following indexes are available, based on the quality scores above:
Quality Model | Dataset | PISA (BM25) | PISA (SPLADE (lg)) | FLEX (TAS-B) |
---|---|---|---|---|
qt5-tiny |
msmarco-passage |
qt5-tiny.msmarco-passage.pisa |
qt5-tiny.msmarco-passage.splade-lg.pisa |
|
(random) | msmarco-passage |
rand.msmarco-passage.pisa |
||
qt5-tiny |
cord19 |
qt5-tiny.cord19.pisa |
qt5-tiny.cord19.splade-lg.pisa |
|
(random) | cord19 |
rand.cord19.pisa |
||
qt5-tiny |
msmarco-passage-v2 |
qt5-tiny.msmarco-passage-v2.pisa |
qt5-tiny.msmarco-passage-v2.splade-lg.pisa |
|
(random) | msmarco-passage-v2 |
rand.msmarco-passage-v2.pisa |
QualT5 and Filter classes can be used in a PyTerrier indexing pipeline, allwowing use with Terrier, PISA, Dense, ColBERT, or SPLADE indexers, for example:
from pyterrier_quality import QualT5, Filter
qmodel = QualT5('pyterrier-quality/qt5-tiny')
pipe = qmodel >> Filter(0.8) >> splade_indexer
pipe.index(corpus)
This repository is for the paper Neural Passage Quality Estimation for Static Pruning at SIGIR 2024. If you use this work, please cite:
@inproceedings{DBLP:conf/sigir/ChangMMM24,
author = {Xuejun Chang and
Debabrata Mishra and
Craig Macdonald and
Sean MacAvaney},
title = {Neural Passage Quality Estimation for Static Pruning},
booktitle = {Proceedings of the 47th International {ACM} {SIGIR} Conference on
Research and Development in Information Retrieval, {SIGIR} 2024},
publisher = {{ACM}},
year = {2024},
url = {https://doi.org/10.1145/3626772.3657765},
doi = {10.1145/3626772.3657765}
}