-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* DS_Store in gitignore * license * pyproject.toml instead of setup.py * pt_docs * test and deploy actions * coverage target * faiss as dev dependency * style check * ruff fixes * fix ruff errors in indexes.py * dev dependencies
- Loading branch information
1 parent
ac4ed9d
commit 6909587
Showing
15 changed files
with
238 additions
and
125 deletions.
There are no files selected for viewing
File renamed without changes.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
name: style | ||
|
||
on: | ||
push: {branches: [master]} # pushes to master | ||
pull_request: {} # all PRs | ||
|
||
jobs: | ||
ruff: | ||
strategy: | ||
matrix: | ||
python-version: ['3.10'] | ||
os: ['ubuntu-latest'] | ||
|
||
runs-on: ${{ matrix.os }} | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
|
||
- name: Install Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Cache Dependencies | ||
uses: actions/cache@v4 | ||
with: | ||
path: ${{ env.pythonLocation }} | ||
key: ${{ matrix.os }}-${{ matrix.python-version }}-${{ hashFiles('requirements.txt', 'requirements-dev.txt') }} | ||
|
||
- name: Install Dependencies | ||
run: | | ||
pip install --upgrade -r requirements-dev.txt | ||
pip install -e . | ||
- name: Ruff | ||
run: 'ruff check --output-format=github pyterrier_dr' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
name: test | ||
|
||
on: | ||
push: {branches: [master]} # pushes to master | ||
pull_request: {} # all PRs | ||
schedule: [cron: '0 12 * * 3'] # every Wednesday at noon | ||
|
||
jobs: | ||
pytest: | ||
strategy: | ||
matrix: | ||
os: ['ubuntu-latest'] | ||
python-version: ['3.8', '3.12'] | ||
|
||
runs-on: ${{ matrix.os }} | ||
env: | ||
runtag: ${{ matrix.os }}-${{ matrix.python-version }} | ||
|
||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
|
||
- name: Install Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Cache Dependencies | ||
uses: actions/cache@v4 | ||
with: | ||
path: ${{ env.pythonLocation }} | ||
key: ${{ env.runtag }}-${{ hashFiles('requirements.txt', 'requirements-dev.txt') }} | ||
|
||
- name: Loading Torch models from cache | ||
uses: actions/cache@v3 | ||
with: | ||
path: /home/runner/.cache/ | ||
key: model-cache | ||
|
||
- name: Install Dependencies | ||
run: | | ||
pip install --upgrade -r requirements.txt -r requirements-dev.txt | ||
pip install -e . | ||
- name: Unit Test | ||
run: | | ||
pytest --durations=20 -p no:faulthandler --json-report --json-report-file ${{ env.runtag }}.results.json --cov pyterrier_dr --cov-report json:${{ env.runtag }}.coverage.json tests/ | ||
- name: Upload Test Results | ||
if: always() | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
path: ${{ env.runtag }}.*.json | ||
overwrite: true | ||
|
||
- name: Report Test Results | ||
if: always() | ||
run: | | ||
printf "**Test Results**\n\n" >> $GITHUB_STEP_SUMMARY | ||
jq '.summary' ${{ env.runtag }}.results.json >> $GITHUB_STEP_SUMMARY | ||
printf "\n\n**Test Coverage**\n\n" >> $GITHUB_STEP_SUMMARY | ||
jq '.files | to_entries[] | " - `" + .key + "`: **" + .value.summary.percent_covered_display + "%**"' -r ${{ env.runtag }}.coverage.json >> $GITHUB_STEP_SUMMARY |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -129,3 +129,5 @@ dmypy.json | |
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2024, Sean MacAvaney | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
include requirements.txt | ||
recursive-include pyterrier_dr *.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
[build-system] | ||
requires = ["setuptools >= 61.0"] | ||
build-backend = "setuptools.build_meta" | ||
|
||
[project] | ||
name = "pyterrier-dr" | ||
description = "Dense Retrieval for PyTerrier" | ||
requires-python = ">=3.8" | ||
authors = [ | ||
{name = "Sean MacAvaney", email = "[email protected]"}, | ||
] | ||
maintainers = [ | ||
{name = "Sean MacAvaney", email = "[email protected]"}, | ||
] | ||
readme = "README.rst" | ||
classifiers = [ | ||
"Programming Language :: Python", | ||
"Operating System :: OS Independent", | ||
"Topic :: Text Processing", | ||
"Topic :: Text Processing :: Indexing", | ||
"License :: OSI Approved :: MIT License", | ||
] | ||
dynamic = ["version", "dependencies"] | ||
|
||
[tool.setuptools.dynamic] | ||
version = {attr = "pyterrier_dr.__version__"} | ||
dependencies = {file = ["requirements.txt"]} | ||
|
||
[project.optional-dependencies] | ||
bgem3 = [ | ||
"FlagEmbedding", | ||
] | ||
|
||
[tool.setuptools.packages.find] | ||
exclude = ["tests"] | ||
|
||
[project.urls] | ||
Repository = "https://github.com/terrierteam/pyterrier_dr" | ||
"Bug Tracker" = "https://github.com/terrierteam/pyterrier_dr/issues" | ||
|
||
[project.entry-points."pyterrier.artifact"] | ||
"dense_index.flex" = "pyterrier_dr:FlexIndex" | ||
"cde_cache.np_pickle" = "pyterrier_dr:CDECache" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,17 @@ | ||
__version__ = '0.2.0' | ||
|
||
from .util import SimFn, infer_device | ||
from .indexes import DocnoFile, NilIndex, NumpyIndex, RankedLists, FaissFlat, FaissHnsw, MemIndex, TorchIndex | ||
from .flex import FlexIndex | ||
from .biencoder import BiEncoder, BiQueryEncoder, BiDocEncoder, BiScorer | ||
from .hgf_models import HgfBiEncoder, TasB, RetroMAE | ||
from .sbert_models import SBertBiEncoder, Ance, Query2Query, GTR | ||
from .tctcolbert_model import TctColBert | ||
from .electra import ElectraScorer | ||
from .bge_m3 import BGEM3, BGEM3QueryEncoder, BGEM3DocEncoder | ||
from .cde import CDE, CDECache | ||
from pyterrier_dr.util import SimFn, infer_device | ||
from pyterrier_dr.indexes import DocnoFile, NilIndex, NumpyIndex, RankedLists, FaissFlat, FaissHnsw, MemIndex, TorchIndex | ||
from pyterrier_dr.flex import FlexIndex | ||
from pyterrier_dr.biencoder import BiEncoder, BiQueryEncoder, BiDocEncoder, BiScorer | ||
from pyterrier_dr.hgf_models import HgfBiEncoder, TasB, RetroMAE | ||
from pyterrier_dr.sbert_models import SBertBiEncoder, Ance, Query2Query, GTR | ||
from pyterrier_dr.tctcolbert_model import TctColBert | ||
from pyterrier_dr.electra import ElectraScorer | ||
from pyterrier_dr.bge_m3 import BGEM3, BGEM3QueryEncoder, BGEM3DocEncoder | ||
from pyterrier_dr.cde import CDE, CDECache | ||
|
||
__all__ = ["FlexIndex", "DocnoFile", "NilIndex", "NumpyIndex", "RankedLists", "FaissFlat", "FaissHnsw", "MemIndex", "TorchIndex", | ||
"BiEncoder", "BiQueryEncoder", "BiDocEncoder", "BiScorer", "HgfBiEncoder", "TasB", "RetroMAE", "SBertBiEncoder", "Ance", | ||
"Query2Query", "GTR", "TctColBert", "ElectraScorer", "BGEM3", "BGEM3QueryEncoder", "BGEM3DocEncoder", "CDE", "CDECache", | ||
"SimFn", "infer_device"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,11 @@ | ||
from .core import FlexIndex, IndexingMode | ||
from .np_retr import * | ||
from .torch_retr import * | ||
from .corpus_graph import * | ||
from .faiss_retr import * | ||
from .scann_retr import * | ||
from .ladr import * | ||
from .gar import * | ||
from .voyager_retr import * | ||
from pyterrier_dr.flex.core import FlexIndex, IndexingMode | ||
from pyterrier_dr.flex import np_retr | ||
from pyterrier_dr.flex import torch_retr | ||
from pyterrier_dr.flex import corpus_graph | ||
from pyterrier_dr.flex import faiss_retr | ||
from pyterrier_dr.flex import scann_retr | ||
from pyterrier_dr.flex import ladr | ||
from pyterrier_dr.flex import gar | ||
from pyterrier_dr.flex import voyager_retr | ||
|
||
__all__ = ["FlexIndex", "IndexingMode", "np_retr", "torch_retr", "corpus_graph", "faiss_retr", "scann_retr", "ladr", "gar", "voyager_retr"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
Dense Retrieval for PyTerrier | ||
======================================================= | ||
|
||
Features to support Dense Retrieval in `PyTerrier <https://github.com/terrier-org/pyterrier>`__. | ||
|
||
.. rubric:: Getting Started | ||
|
||
.. code-block:: console | ||
:caption: Install ``pyterrier-dr`` with ``pip`` | ||
$ pip install pyterrier-dr | ||
Import ``pyterrier_dr``, load a pre-built index and model, and retrieve: | ||
|
||
.. code-block:: python | ||
:caption: Basic example of using ``pyterrier_dr`` | ||
>>> from pyterrier_dr import FlexIndex, TasB | ||
>>> index = FlexIndex.from_hf('macavaney/vaswani.tasb.flex') | ||
>>> model = TasB('sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco') | ||
>>> pipeline = model.query_encoder() >> index.np_retriever() | ||
>>> pipeline.search('chemical reactions') | ||
score docno docid rank qid query | ||
0 95.841721 7049 7048 0 1 chemical reactions | ||
1 94.669395 9374 9373 1 1 chemical reactions | ||
2 93.520027 3101 3100 2 1 chemical reactions | ||
3 92.809227 6480 6479 3 1 chemical reactions | ||
4 92.376190 3452 3451 4 1 chemical reactions | ||
.. ... ... ... ... .. ... | ||
995 82.554390 7701 7700 995 1 chemical reactions | ||
996 82.552139 1553 1552 996 1 chemical reactions | ||
997 82.551933 10064 10063 997 1 chemical reactions | ||
998 82.546890 4417 4416 998 1 chemical reactions | ||
999 82.545776 7120 7119 999 1 chemical reactions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,9 @@ | ||
pytest | ||
pytest-subtests | ||
pytest-cov | ||
pytest-json-report | ||
git+https://github.com/terrierteam/pyterrier_adaptive | ||
voyager | ||
FlagEmbedding | ||
faiss-cpu | ||
ruff |
Oops, something went wrong.