Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export ids during tokenisation #30

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cmacdonald
Copy link
Contributor

optionally, export ids during tokenisation

@okhat
Copy link
Collaborator

okhat commented Apr 12, 2021

Thanks Craig!

I'm not sure of the purpose of this:

if with_ids:
    #the masking code assumes that args.mask_punctuation is false.
    assert len(self.colbert.skiplist) == 0

Seems related to what we discussed by email. But we're moving in the direction of having masking punctuation as a default and discouraging keeping them. This may conflict with that. Thoughts?

@cmacdonald
Copy link
Contributor Author

Good question, I'll look into this tomorrow.

@okhat
Copy link
Collaborator

okhat commented Apr 20, 2021

By the way, and this is a bit off topic for this PR, can we remove the setup.py file? Are there still use-cases where pip is essential?

@cmacdonald
Copy link
Contributor Author

Sorry for latency - other pressures.

I'm keen to keep setup.py. See for example: https://colab.research.google.com/github/cmacdonald/pyterrier_colbert/blob/main/vaswani.ipynb (or equivalently https://github.com/terrierteam/pyterrier_colbert/blob/main/vaswani.ipynb)

Our pyt_colbert wrapper just expresses a dependency on (our current fork) of Colbert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants